[jira] Created: (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] to be more memory efficient.

Aaron McCurry (JIRA) Wed, 13 Jan 2010 00:29:20 -0800

Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the 
index pointer long[] to be more memory efficient.
---------------------------------------------------------------------------------------------------------------------------------


                 Key: LUCENE-2205
                 URL: https://issues.apache.org/jira/browse/LUCENE-2205
             Project: Lucene - Java
          Issue Type: Improvement
         Environment: Java5
            Reporter: Aaron McCurry


Basically packing those three arrays into a byte array with an int array as an 
index offset.  

The performance benefits are stagering on my test index (of size 6.2 GB, with 
~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
terminfos into memory were reduced to 17% of there original size.  From 291.5 
MB to 49.7 MB.  The random access speed has been made better by 1-2%, load time 
of the segments are ~40% faster as well, and full GC's on my JVM were made 7 
times faster.

I have already performed the work and am offering this code as a patch.  
Currently all test in the trunk pass with this new code enabled.  I did write a 
system property switch to allow for the original implementation to be used as 
well.

-Dorg.apache.lucene.index.TermInfosReader=default or small

I have also written a blog about this patch here is the link.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Created: (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] to be more memory efficient.

Reply via email to