On Friday 09 September 2005 00:34, Doug Cutting wrote:
> Paul Elschot wrote:
> > I suppose one of these cases are when many terms are used in a query. 
> > Would it be easily possible to make the buffer size for a term iterator
> > depend on the numbers of documents to be iterated?
> > Many terms only occur in a few documents, so this could be a 
> > nice win on total buffer size for the many terms case.
> 
> This would not be too difficult.
> 
> Look in SegmentTermDocs.java.  The buffer may be allocated when the 
> parent's stream is first cloned, but clone() won't allocate a buffer if 
> the source hasn't had a buffer allocated yet, and nothing should perform 
> i/o directly on the parent's freqStream, so in practice a buffer should 
> not be allocated until the first read is performed on the clone.

I tried delaying the buffer allocation in BufferedIndexInput by
using this clone() method:

  public Object clone() {
    BufferedIndexInput clone = (BufferedIndexInput)super.clone();
    clone.buffer = null;
    clone.bufferLength = 0;
    clone.bufferPosition = 0;
    clone.bufferStart = getFilePointer(); 
    return clone;
  }

With this all term document iterators seem to be empty, no
query in the test cases gives any results, for example TestDemo
and TestBoolean2.
As far as I can see, this delaying should work, but it doesn't and
I have no idea why.

> So one could add an BufferedIndexInput.setBufferSize() method and then 
> call it in SegmentTermDocs.seek(TermInfo), when the df is known and a 
> buffer has not yet been allocated.

Indeed that looks easy enough. Now, if I could only delay
the buffer allocation...

I noticed that RAMIndexInput extends BufferedIndexInput.
It has all data in buffers already, so why is there another
layer of buffering?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to