Re: Polymorphic Index

Grant Ingersoll Fri, 22 Oct 2010 06:27:02 -0700

On Oct 21, 2010, at 3:44 PM, eks dev wrote:

> Hi All, 
> I am trying to figure out a way to implement following use case with 
> lucene/solr. 
> 
> 
> In order to support simple incremental updates (master) I need to index  and 
> store UID Field on 300Mio collection. (My UID is a 32 byte  sequence). But I 
> do 
> not need indexed (only stored) it during normal  searching (slaves). 
> 
> 
> The problem is that my term dictionary gets blown away with sheer number  of 
> unique IDs. Number of unique terms on this collection, excluding UID  is less 
> than 7Mio.
> I can tolerate resources hit on Updater (big hardware, on disk index...).
> 
> This is a master slave setup, where searchers run from RAMDisk and  having 
> 300Mio * 32 (give or take prefix compression) plus pointers to  postings and 
> postings is something I would really love to avoid as this  is significant 
> compared to really small documents I have. 
> 
> 
> Cutting to the chase:
> How I can have Indexed UID field, and when done with indexing:
> 1) Load "searchable" index into ram from such an index on disk without one 
> field?


That doesn't seem like it would be all that hard to do in Lucene with a few 
edits to the appropriate low level classes to simply not load the term 
dictionary for a particular set of fields (pass in a set?).  This sort of 
masking even seems like a generally useful performance gain in the typical 
master/worker replicated environment.

> 
> 2) create 2 Indices in sync on docIDs, One containing only indexed UID

Kind of reminds me of Andrzej's pruning codec stuff.  Perhaps the new Flex 
stuff helps here?

> 3) somehow transform index with indexed UID by droping UID field, preserving 
> docIs. Kind of tool smart index-editing tool. 

Again, take a look at Andrzej's pruning codec.

-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Polymorphic Index

Reply via email to