Grant Ingersoll wrote:
On Jan 5, 2010, at 7:44 AM, Paul Taylor wrote:
So currently in my index I index and store a number of small fields, I need
both so I can search on the fields, then I use the stored versions to generate
the output document (which is either an XML or JSON representation), because I
read stored and index fields are dealt with completely seperately I tried
another tact only storing one field which was a serialized version of the
output documentation. This solves a couple of issues I was having but I was
disappointed that both the size of the index increased and the index build
time increased, I thought that if all the stored data was held in one field
that the resultant index would be smaller, and I didn't expect index time to
increase by as much as it did. I was also suprised that Java serilaization was
slower and used more space than both JSON and XML serialization.
Results as Follows
Type: Time : Index
Size
Only indexed no norms
105 : 38 MB
Only indexed
111 : 43 MB
Same fields written as Indexed and Stored (current Situation) 115
: 83 MB
Fields Indexed, One JAXB classed Stored using JSON Marshalling 140 : 115 MB
Fields Indexed, One JAXB classed Stored using XML Marshalling 189 : 198 MB
Fields Indexed, One JAXB classed Stored using Java Serialization 305 : 485
MB
How much more verbose are these than the "raw" content? Even as terse as JSON is, it is still verbose compared to a binary format, and XML Marshalling and Java Serialization will be even more. Given that you are likely only displaying 10 or so at a time, I'd think it would be much more efficient to only store the minimal amount needed to recreate the docs in the current result set.
Yes, in the end I came to the conclusion to just stick with current
situation except for cases where i have sets of related fields that
would otherwise nessecitate holding 'placeholder' fields, in which case
I've used json
I've also seen people have success simply storing a key in Lucene that is then
used for lookup in something like Memcachedb, Tokyo Cabinet or one of the many
other key-value stores.
In my situation 90% of the fields stored are also required for
searching, so they are held in the search index anyway so there is not
much point moving the stored version into a memcahe
thanks Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org