Re: improve how IndexWriter uses RAM to buffer added documents

Yonik Seeley Mon, 30 Apr 2007 06:19:12 -0700

On 4/30/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote:

After discussion on java-dev last time, I decided to retry the
"persistent hash" approach, where the Postings hash lasts across many
docs and then a single flush produces a partial segment containing all
of those docs.  This is in contrast to the previous approach where
each doc makes its own segment and then they are merged.


It turns out this is even faster than my previous approach,


Go, Mike, go!

With this new approach, as I process each term in the document I
immediately write the prox/freq in their compact (vints) format into
shared byte[] buffers, rather than accumulating int[] arrays that then
need to be re-processed into the vint encoding.  This speeds things up
because we don't double-process the postings.


Good idea!

 It also uses less
per-document RAM overhead because intermediate postings are stored as
vints not as ints.


I'm just trying to follow along at a high level...how do you handle
intermediate termdocs?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: improve how IndexWriter uses RAM to buffer added documents

Reply via email to