On Jan 5, 2010, at 7:44 AM, Paul Taylor wrote:

> So currently in my index I index and store a number of small fields, I need 
> both so I can search on the fields, then I use the stored versions to 
> generate the output document (which is either an XML or JSON representation), 
> because I read stored and index fields are dealt with completely seperately I 
> tried another tact only storing one field which was a serialized version of 
> the output documentation. This solves a couple of issues I was having but I 
> was disappointed that both the size of the index increased and the index 
> build  time increased, I thought that if all the stored data was held in one 
> field that the resultant index would be smaller, and I didn't expect index 
> time to increase by as much as it did. I was also suprised that Java 
> serilaization was slower and used more space than both JSON and XML 
> serialization.
> 
> Results as Follows
> 
> Type:                                                             Time : 
> Index Size
> Only indexed  no norms                                                        
>             105   : 38 MB
> Only indexed                                                                  
>                    111   : 43 MB
> Same fields written as Indexed and Stored  (current Situation)           115  
>  : 83 MB
> Fields Indexed, One JAXB classed Stored using JSON Marshalling 140   : 115 MB
> Fields Indexed, One JAXB classed Stored using XML Marshalling  189   : 198 MB
> Fields Indexed, One JAXB classed Stored using Java Serialization   305   : 
> 485 MB

How much more verbose are these than the "raw" content?  Even as terse as JSON 
is, it is still verbose compared to a binary format, and XML Marshalling and 
Java Serialization will be even more.  Given that you are likely only 
displaying 10 or so at a time, I'd think it would be much more efficient to 
only store the minimal amount needed to recreate the docs in the current result 
set.  

I've also seen people have success simply storing a key in Lucene that is then 
used for lookup in something like Memcachedb, Tokyo Cabinet or one of the many 
other key-value stores.

-Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to