Hi, On Wed, May 21, 2008 at 1:37 PM, Martin Langhoff <[EMAIL PROTECTED]> wrote: > > Any idea if someone can lend a hand with the DS issues I mentioned in > my opening post? To recap: > > - Add a "dump all metadata to a file" mechanism in > datastore/xapianindex.py that is fast. It could be one file per > document, that wouldn't bother me in the least. As long as the > resulting format is a JSON dump of a reasonable datastructure, I'm a > happy camper. > > - Sort out the story with pause()/unpause(). The functions in > datastore.py are meant to "support backup", but I think they are > broken. Reading through the implementation, they call stop() on the > backends, which in the case of Xapian, means that the datastore is > dead in the water while paused, and normal usage will fail.
the patch attached maintains a copy of the metadata of each object outside the xapian index. How it works: - at every create and update, a json file is created next to the object's file, - it's also deleted along the object, - at startup, if the file <datastore_path>/.metadata.exported doesn't exist, check how many objects need to get their metadata exported (0.8s for 3000 entries) - in an idle callback, process each of those objects one per iteration (3ms per entry with simplejson, 2ms with cjson). In my tests this has worked quite well, but I have one concern: can something bad happen if we have 20k files in the same dir (for a journal with 10k entries)? One side effect of this is that when (if) we agree on a new on-disk data structure for the DS, it will be easier to convert than if we had to extract all the metadata from the index. Regards, Tomeu _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
