Greetings,

I have an index where I import documents such as powerpoint, PDF, and so forth. 
 One nice feature I added is that for each document, I store a thumbnail of the 
first page as an encoded String (uuencode) using a stored,not-indexed field.  
This thumbnail gets displayed when the user finds a document.   

I am wondering if, as the size of the index grows to perhaps hundreds of 
thousands if not millions of documents,  how efficient is this?  Is it a good 
idea?
These encoded strings could be several hundred bytes in size, and of course are 
completely unique for each file indexed, and provide no 'search' value.  On the 
surface, it seems like there could be a better way to do this given the size, 
as well as the extra retrieval time for Lucene to pull these fields for found 
documents.

Since I also have a unique hash for each document in the index, it would not be 
too difficult to set up a separate, independent NoSQL key/value store with the 
thumbnail images, such as MongoDB or similar, and then retrieve the thumbnails 
from that store instead of keeping them in the Lucene index.  Does this seem 
like a better approach? Or is Lucene stored field retrieval efficient enough 
that there would be no benefit to doing this?  Any other ideas?

Thanks in advance,
J


  



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to