Grant Ingersoll wrote:
On Jan 5, 2010, at 7:44 AM, Paul Taylor wrote:

So currently in my index I index and store a number of small fields, I need 
both so I can search on the fields, then I use the stored versions to generate 
the output document (which is either an XML or JSON representation), because I 
read stored and index fields are dealt with completely seperately I tried 
another tact only storing one field which was a serialized version of the 
output documentation. This solves a couple of issues I was having but I was 
disappointed that both the size of the index increased and the index build  
time increased, I thought that if all the stored data was held in one field 
that the resultant index would be smaller, and I didn't expect index time to 
increase by as much as it did. I was also suprised that Java serilaization was 
slower and used more space than both JSON and XML serialization.

Results as Follows

Type:                                                             Time : Index 
Size
Only indexed  no norms                                                          
          105   : 38 MB
Only indexed                                                                    
                 111   : 43 MB
Same fields written as Indexed and Stored  (current Situation)           115   
: 83 MB
Fields Indexed, One JAXB classed Stored using JSON Marshalling 140   : 115 MB
Fields Indexed, One JAXB classed Stored using XML Marshalling  189   : 198 MB
Fields Indexed, One JAXB classed Stored using Java Serialization   305   : 485 
MB

How much more verbose are these than the "raw" content? Even as terse as JSON is, it is still verbose compared to a binary format, and XML Marshalling and Java Serialization will be even more. Given that you are likely only displaying 10 or so at a time, I'd think it would be much more efficient to only store the minimal amount needed to recreate the docs in the current result set.
Yes, in the end I came to the conclusion to just stick with current situation except for cases where i have sets of related fields that would otherwise nessecitate holding 'placeholder' fields, in which case I've used json
I've also seen people have success simply storing a key in Lucene that is then 
used for lookup in something like Memcachedb, Tokyo Cabinet or one of the many 
other key-value stores.
In my situation 90% of the fields stored are also required for searching, so they are held in the search index anyway so there is not much point moving the stored version into a memcahe

thanks Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to