Hey Christopher,

On 2011-05-03, at 6:22 AM, Christopher Armstrong wrote:

> I've put together some patches for ObjectMerging4 that:
> 1) Add makefiles for GNUstep
> 2) Fix compilation errors
> 3) Add primitive value persistence
> 
> Please find them attached at: https://gna.org/patch/index.php?2652
> 
Cool. I haven't had a good look yet.

I'm going to try to do some experimenting over the weekend with ObjectMerging, 
though.

Last week I chatted with Quentin and we settled a few things:

- the undo/redo stack/tree must be persistent, so your computer restarting, or 
a process restarting, doesn't clear the stack.
- the undo/redo user commands must apply to revision control operations like 
create branch, switch branch, merge, revert to previous version, etc. My 
favourite way of justifying this is to just look at the number of questions on 
stackoverflow.com of the form "Help, I accidentally did XYZ to my git/svn/hg 
respository, how do I undo it?" ;-)
- applications need to have significant control over how undo/redo are 
implemented. It's part of the UI design and we don't want the database to be 
too limiting. 
- the main failing of my ProjectDemo app was I never attempted to implement 
normal undo/redo. It was still a useful testbed for validating ideas, though.

We decided that I should try to revise OM4 with the above points in mind and 
try to build a simple composite document editor to validate it.

> Furthermore, I wanted to promote some discussion on OM4. I've identified some 
> issues that come to mind:
> 
> 1) Object Roots
> 
> OM4 does not seem to provide a way to separate "core objects" so that you 
> have separate object graphs that cannot overlap - it seems everything in the 
> same repository can reference any other object in the repository. 

I originally did this because I wanted to avoid giving special properties to 
'root' objects. My motivation for that principle is taking a 'root' document 
and moving it inside a document shouldn't change its properties too much (how 
it responds to revision control, etc.) I'm not totally sure how important that 
is, though.

> On first thought, this seems to be okay, but it makes it almost impossible to 
> work on the same repository from different processes without a 
> synchronisation or notification mechanism (which I don't think sqlite3 can 
> provide).

Yeah, it'll take some extra work to support simultaneous editing between 
multiple processes.

> We need object roots at some point so that we can identify the top-level 
> objects in a users workspace during search.

I'm not sure it's absolutely necessary - say you have a set of search results 
which include sub-nodes of documents; you could navigate through the nodes' 
parents until you hit document root nodes.

> Having a table for them would also help with indexing because we could store 
> the object's type (used for opening it) and the date it was last 
> created/accessed so that we can improve search results (i.e. put more 
> recently accessed objects first in a search).

That sounds good.

Overall, segregating objects is probably a good idea. My hunch is that it 
should be mostly an implementation detail and not have any observable effect on 
the use of the library, but I'm not sure.

> 2) Property storage
> 
> At the moment we are persisting objects using XML property lists. I think 
> this could reduce the quality of text-based searches, as any search will pick 
> up XML nodes any things like dictionary keys, which shouldn't be indexed. By 
> the looks of things, we could use a custom tokenizer with the FTS (full text 
> search) mechanism (http://www.sqlite.org/fts3.html#tokenizer), but this would 
> be easier if we stored them as old OpenStep property lists.

I don't feel too strongly about XML vs OpenStep plist. The choice shouldn't 
have any effect on FTS, though, because I just index the values of string 
properties, not the whole file.

> For simple properties, we should probably have a type column that specifies 
> its type, even if we store the value as a string. For this case, sqlite 
> provides almost automatic type conversion, which we could use to our 
> advantage to help make search results more relevant. For example, if the user 
> types in a string that can be converted to a date, we could also do a "date" 
> search on the database against any date-type properties. This will expand the 
> list of relevant results, especially where data is stored in a different 
> locale to the way the user is using it (e.g. different date or number 
> formats).

Something like that sounds good. Along the same lines, we could extend 
stemming/tokenizing to dates - we just need a date parser that can locate 
loosely formatted dates in text. Then we can search for "June 9" and pick up 
documents containing "06/09/11", say, since they both get stemmed to the same 
date representation.

btw, I did a bit of research a while ago on stemmers for text and it sounds 
like the state-of-the-art free one is Hunspell (as well as being the 
state-of-ther-art free spell checker - it's the openoffice.org spellchecker.) 
I've done some quick tests with it but that's about all.

Cheers
Eric
_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Reply via email to