Hi,

On Wed, May 21, 2008 at 1:37 PM, Martin Langhoff
<[EMAIL PROTECTED]> wrote:
>
> Any idea if someone can lend a hand with the DS issues I mentioned in
> my opening post? To recap:
>
>  - Add a "dump all metadata to a file" mechanism in
> datastore/xapianindex.py that is fast. It could be one file per
> document, that wouldn't bother me in the least. As long as the
> resulting format is a JSON dump of a reasonable datastructure, I'm a
> happy camper.
>
>  - Sort out the story with pause()/unpause(). The functions in
> datastore.py are meant to "support backup", but I think they are
> broken. Reading through the implementation, they call stop() on the
> backends, which in the case of Xapian, means that the datastore is
> dead in the water while paused, and normal usage will fail.

the patch attached maintains a copy of the metadata of each object
outside the xapian index. How it works:

- at every create and update, a json file is created next to the object's file,

- it's also deleted along the object,

- at startup, if the file <datastore_path>/.metadata.exported doesn't
exist, check how many objects need to get their metadata exported
(0.8s for 3000 entries)

- in an idle callback, process each of those objects one per iteration
(3ms per entry with simplejson, 2ms with cjson).

In my tests this has worked quite well, but I have one concern: can
something bad happen if we have 20k files in the same dir (for a
journal with 10k entries)?

One side effect of this is that when (if) we agree on a new on-disk
data structure for the DS, it will be easier to convert than if we had
to extract all the metadata from the index.

Regards,

Tomeu
_______________________________________________
Devel mailing list
[email protected]
http://lists.laptop.org/listinfo/devel

Reply via email to