On Friday 09 September 2005 00:34, Doug Cutting wrote: > Paul Elschot wrote: > > I suppose one of these cases are when many terms are used in a query. > > Would it be easily possible to make the buffer size for a term iterator > > depend on the numbers of documents to be iterated? > > Many terms only occur in a few documents, so this could be a > > nice win on total buffer size for the many terms case. > > This would not be too difficult. > > Look in SegmentTermDocs.java. The buffer may be allocated when the > parent's stream is first cloned, but clone() won't allocate a buffer if > the source hasn't had a buffer allocated yet, and nothing should perform > i/o directly on the parent's freqStream, so in practice a buffer should > not be allocated until the first read is performed on the clone.
I tried delaying the buffer allocation in BufferedIndexInput by using this clone() method: public Object clone() { BufferedIndexInput clone = (BufferedIndexInput)super.clone(); clone.buffer = null; clone.bufferLength = 0; clone.bufferPosition = 0; clone.bufferStart = getFilePointer(); return clone; } With this all term document iterators seem to be empty, no query in the test cases gives any results, for example TestDemo and TestBoolean2. As far as I can see, this delaying should work, but it doesn't and I have no idea why. > So one could add an BufferedIndexInput.setBufferSize() method and then > call it in SegmentTermDocs.seek(TermInfo), when the df is known and a > buffer has not yet been allocated. Indeed that looks easy enough. Now, if I could only delay the buffer allocation... I noticed that RAMIndexInput extends BufferedIndexInput. It has all data in buffers already, so why is there another layer of buffering? Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]