Michael, How would IndexWriter.addIndexes() work with unique doc ids?
Regards, Paul Elschot Op Tuesday 22 January 2008 12:07:16 schreef Michael Busch: > Hi Team, > > the question of how to delete with IndexWriter using doc ids is > currently being discussed on java-user > (http://www.gossamer-threads.com/lists/lucene/java-user/57228), so I > thought this is a good time to mention an idea that I recently had. I'm > planning to work on column-stored fields soon (I used to call them > per-document payloads). Then we'll have the ability to store metadata > for each document very efficiently in the index. > > This new data structure could be used to store a unique ID for each doc > in the index. The IndexReader would then get an API that provides a > mapping from the dynamic doc ids to the new unique ones. We would also > have to store a reverse mapping (UID -> ID) in the index - we could use > a VInt list + skip list for that. > > Then we should be able to make IndexReaders "read-only" (LUCENE-1030) > and provide a new API in IndexWriter "delete by UID". This would allow > to "delete by query" as well. The disadvantage is that the index would > become bigger, but that should still be ok: 8 bytes per doc for the > ID->UID map (assuming we took long for the UID, which I'd suggest). The > UID->ID map might even be a bit smaller initially (using VInts and > VLongs), but might become bigger when the index has lot's of deleted > docs, because then the delta encoding wouldn't be as efficient anymore > for the UIDs. > > If RAM permits, the maps could also be cached in memory (optional, > configurable). The FieldCache overhaul (LUCENE-831) with column fields > as source can help here. > > After all this is implemented (column fields, UIDs, "read-only" > IndexReaders, FieldCache overhaul) I'd like to make the column fields > (and norms) updateable via IndexWriter. > > OK lot's of food for thought. > > -Michael > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]