On Oct 31, 2009, at 12:26 PM, Eduard Moraru wrote:

> Context left and context right could be an idea.
>
> However, what do you do about the static size of the context when, for
> example, you have a 500 character document and you make only 2  
> annotations?
> That results in storing 2x300 = 600 characters in just 2  
> annotations. That
> is already duplicating the document's content in size. If you make
> additional annotations, you duplicate the document several times.
>
That's a tradeoff of course...

the underlying problem is that many XWiki documents have the content  
users finally see, generated in some way (think of a blog post, or the  
Watch news coming from objects)

We need to find a way to map what the user see to where it comes from.

Now, since XWiki allows you to display everything using several Turing- 
Complete mechanisms (groovy scripts, velocity, etc.), making this  
mapping implies being able to understand what's coded in a page (not  
possible), or force the author to "mark" somehow the source of the  
content in their script (impractical), or give the user a constrained  
scripting language where this information is made explicit (limiting)

The solution is to apply heuristics in order to retrieve annotations  
in the text the users really sees : we called this "Canonical  
Representation", which basically corresponds to the XDOM after the  
transformations and before the rendering. In this way we don't really  
care where the annotated content comes from. As long as it's there and  
we are able to recognize and locate it, we can display it as annotated  
content. If we are unable to do so then we simply don't display the  
annotation.

Now the problem is : what are reasonable heuristics that work in the  
most common cases? (80/20 rule) We proposed one.

> The part where annotations appear depending on user rights, sounds  
> cool, but
> how can you detect when the dynamic content changes and fix your
> annotations? (like you do for static content)
>
Again, heuristics.

In the case when you have no generated content (what the user sees is  
all contained in a single page) you can rely on a diff from the  
previous version of the page and be able to understand what happened  
(adjusting annotation accordingly). This, imho, should work perfectly.

In the case of generated content you could not be able to do a diff  
(because you don't know where the content came from, and consequently  
what changed) but you can still be able to do some smart things in  
order to "guess" what happened to your annotation. And if you are  
unable to do this guess then you display in a box that there are  
"stale" annotations that were there before and that cannot be placed  
anymore.

> While I'm not convinced about this approach, you may be right and,  
> comparing
> with the existing one (which I did not take the time to understand  
> in detail
> as you had), and other issues which you underlined in your reply, it  
> sounds
> like a start.
>
It's surely a start. But what was clear during our discussions with  
Anca was that offsets are brittle and cumbersome when content comes  
from different sources : if you want to use offsets and annotate a  
blog post, for example, you should be able to say that the annotation  
starts at offset X of the field Y of the object Z on the page P (or  
any variant of this for any possible content source). Who gives you  
this information if all that you can see in the requested page is a  
#include('something') ? How could you encode this information in a  
standard way ? Too much complicated.

Since we have a lot of use cases of this type (blog post, watch feeds,  
and in general data coming from objects and displayed using general  
purposes languages) we should think about another simpler solution.

The proposed one is not perfect (this is the price to pay for having  
such a powerful wiki platform that allows you to do whatever it's  
calculable) but it should work nicely in most cases. As I said before  
it should cover correctly all the cases where documents are self- 
contained (i.e., all the use cases where the current annotation system  
works)

Returning to your remark at the beginning about the storage... That's  
a tradeoff. It's sure that the more data we store, the higher is the  
degree of correctness in the dynamic cases.

Hope that this clarified a little bit what we are trying to achieve.

Anyway, if you have more ideas/comments don't hesitate.

-Fabio

P.S.: Offsets could be useful in the heuristics too and we could  
continue to store them as well. In fact they could help to locate,  
more or less, where the annotation was done. But they should only give  
a hint, not a precise information.


_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to