Doug Cutting wrote:
>Your analysis looks good to me.
>
>I think it would be simpler, if a bit less optimized, to just make
>SegmentsReader.numDocs() and SegmentsReader.delete() synchronized methods.
>Does that sound like a reasonable fix to you?
>
Yes, that sounds fine. Delete can definetely just be a synchronized
method. And so can the numDocs unless it is called a lot. Is it? If it
is, we may want to leave the upfront check in there before it is
synchronized.
>
>
>Thanks for spotting this.
>
No problem :). I'm learning a lot by taking the code apart and figuring
out what goes where. It is very impressive! :) A lot of optimizations in
terms of pre-computing numerical stuff, keeping the memory foot print
low. The use of PriorityQueues everywhere. Speaking of which, I was just
going through the SegmentsTermEnum and noticed that in the next() method
it does a queue.pop() followed by queue.put(), instead of the
queue.adjustTop() which the comment says is at least two times faster. I
guess the adjustTop was added after this code was written, but maybe I'm
missing something or this queue is small enough that this doesn't
matter. I wasn't going to change it or bring it up, but since we are
talking ... Here's the code:
==================================
public final boolean next() throws IOException {
SegmentMergeInfo top = (SegmentMergeInfo)queue.top();
if (top == null) {
term = null;
return false;
}
term = top.term;
docFreq = 0;
while (top != null && term.compareTo(top.term) == 0) {
queue.pop();
docFreq += top.termEnum.docFreq(); // increment freq
if (top.next())
queue.put(top); // restore queue
else
top.close(); // done with a segment
top = (SegmentMergeInfo)queue.top();
}
return true;
}
=============================
>
>
>As for closing, your analysis also sounds correct: if an index is closed
>from one thread while another thread is reading it, then the other thread
>may experience errors. I think the a good solution here is simply to not
>close an IndexReader when there may be other threads that may be using it.
>FSDirectory closes files with finalize methods, so all files will be closed
>when the IndexReader is garbage collected. The primary client of the
>close() method is the index merging code, which requires prompt closing of
>files to keep the number of file descriptors low, and knows that other
>threads cannot be accessing them.
>
>So I think changing the documentation and examples will suffice here. The
>documentation for IndexReader.close() method should note that it should only
>be called if no other threads are accessing the index, and the search
>examples should not bother to call close, letting the garbage collector
>close files. Is that acceptable?
>
Sure. Too bad, though!
I liked your implementation of the Lock.With and I was hoping to see how
you would solve this synchronization problem where you have readers and
writers instead of just a single lock. Is there a good solution to this
problem you could point me to?