Michael:

Thanks, that's what I figured, but it's nice to have confirmed.

Erick

On Jan 20, 2008 11:59 AM, Michael McCandless <[EMAIL PROTECTED]>
wrote:

>
> Hi Erick,
>
> Yes, you do still need to guard against this case in 2.3.  IndexWriter
> checks the RAM usage after each doc is processed and flushes when
> that's over the limit.
>
> However, the memory consumed by a very large doc should be quite a bit
> less than before, because in 2.3 IndexWriter makes more more efficient
> use of RAM.
>
> Mike
>
> Erick Erickson wrote:
>
> > About flush by RAM
> >
> > I was playing around with something similar on the 2.1 codebase
> > (roll-my-own)
> > and had the quirk of a possible *very* large incoming document. As
> > in 250M.
> > So I had to put some logic in to try say, in effect, "if the
> > incoming doc is
> > completely ridiculous, flush now". I should say that I was
> > impressed that
> > we could even index the bloody thing at all!
> >
> > Is this something that still needs to be guarded against in 2.3? In
> > other
> > words, should the flush size be chosen so that (current RAM size + the
> > increment caused by the biggest doc possible in your data set) be <
> > the
> > threshold?
> >
> > You see the problem here. In the silliest case, where I have one HUGE
> > document
> > that barely fit in memory, I'd have to set the threshold very low,
> > flushing
> > early
> > and often unless there was a "flush now" bit of logic for silly docs.
> >
> > If you must know, the huge doc was the 23 volume "encyclopedia of
> > Michigan
> > Civil War Volunteers". Yeah, yeah, sure. I could have done other
> > things than
> > index it as a single doc, but since indexing speed wasn't really an
> > issue
> > and
> > the PM wanted it that way and all it meant was that indexing took 6
> > hours
> > rather than, perhaps, 4 on a static data set, I didn't care enough
> > to do
> > more work.
> >
> > Don't get me wrong, having a flush by RAM size is sweet. And for any
> > reasonable
> > corpus, especially one with relatively constant input docs, it
> > should be
> > very nice
> > indeed. I'm wondering about the outlier cases since I seem to run
> > into them,
> > siiiggghh.
> > But "that's why they pay me the big bucks" <G>.
> >
> > Thanks
> > Erick
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to