On Oct 21, 2010, at 3:44 PM, eks dev wrote: > Hi All, > I am trying to figure out a way to implement following use case with > lucene/solr. > > > In order to support simple incremental updates (master) I need to index and > store UID Field on 300Mio collection. (My UID is a 32 byte sequence). But I > do > not need indexed (only stored) it during normal searching (slaves). > > > The problem is that my term dictionary gets blown away with sheer number of > unique IDs. Number of unique terms on this collection, excluding UID is less > than 7Mio. > I can tolerate resources hit on Updater (big hardware, on disk index...). > > This is a master slave setup, where searchers run from RAMDisk and having > 300Mio * 32 (give or take prefix compression) plus pointers to postings and > postings is something I would really love to avoid as this is significant > compared to really small documents I have. > > > Cutting to the chase: > How I can have Indexed UID field, and when done with indexing: > 1) Load "searchable" index into ram from such an index on disk without one > field?
That doesn't seem like it would be all that hard to do in Lucene with a few edits to the appropriate low level classes to simply not load the term dictionary for a particular set of fields (pass in a set?). This sort of masking even seems like a generally useful performance gain in the typical master/worker replicated environment. > > 2) create 2 Indices in sync on docIDs, One containing only indexed UID Kind of reminds me of Andrzej's pruning codec stuff. Perhaps the new Flex stuff helps here? > 3) somehow transform index with indexed UID by droping UID field, preserving > docIs. Kind of tool smart index-editing tool. Again, take a look at Andrzej's pruning codec. -Grant --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org