See my similar request from last March:
http://www.nabble.com/FieldSortedHitQueue-enhancement-to9733550.html#a9733550
Peter
On Dec 11, 2007 11:54 AM, Nadav Har'El <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance Improvement for
> Search using PriorityQueue":
> > Hi
> >
> > Lucene's PQ implements two methods: put (assumes the PQ has room for the
> > object) and insert (checks whether the object can be inserted etc.). The
> > implementation of insert() requires the application that uses it to
> allocate
> > a new object every time it calls insert. Specifically, it cannot reuse
> the
> > objects that were removed from the PQ.
>
> I've read this entire thread, and would like to add my comments about
> three
> independent issues, which I think that can and perhaps should be
> considered
> separately:
>
> 1. When Shai wanted to add the insertWithOverflow() method to
> PriorityQueue
> he couldn't just subclass PriorityQueue in his application, but rather
> was forced to modify PriorityQueue itself. Why? just because one field
> of that classi - "heap" - was defined "private" instead of "protected".
>
> Is there a special reason for that? If not, can we make the trivial
> change
> to make PriorityQueue's fields protected, to allow Shai and others (see
> the
> next point) to add functionality on top of PriorityQueue?
>
> 2. PriorityQueue, in addition to being used in about a dozen places inside
> Lucene, is also a public class that advanced users often find useful
> when
> implementing features like new collectors, new queries, and so on.
> Unfortunately, my experience exactly matches Shai's: In the two
> occasions
> where I used a PriorityQueue, I found that I needed such a
> insertWithOverflow() method. If this feature is so useful (I can't
> believe
> that Shai and me are the only ones who found it useful), I think it
> would
> be nice to add it to Lucene's PriorityQueue, even if it isn't (yet) used
> inside Lucene.
>
> Just to make it sound more interesting, let me give you an example where
> I needed (and implemented) an "insertWithOverflow()": I was implementing
> a
> faceted search capability over Lucene. It calculated a count for each
> facet value, and then I used a PriorityQueue to find the 10 best values.
> The problem is that I also needed an "other" aggregator, which was
> supposed
> to aggregate (in various ways) all the facets except the 10 best ones.
> For
> that, I needed to know which facets dropped off the priorityqueue.
>
> 3. Finally, Shai asked for this new PriorityQueue.insertWithOverflow()
> to be used in TopDocCollector. I have to admit I don't know how much
> of a benefit this will be in the "typical" case. But I do know that
> there's no such thing as a "typical" case...
> I can easily think of a quite typical "worst case" though: Consider a
> collection indexed in order of document age (a pretty typical scenario
> for a long-running index), and then you do a sorting query
> (TopFieldDocCollector), asking it to bring the 10 newest documents.
> In that case, each and every document will have a new DocScore created -
> which is the worst-case that Shai feared.
> It would be nice if Shai or someone else could provide a measurement in
> that case.
>
> P.S. When looking now at PriorityQueue's code, I found two tiny
> performance improvements that could be easily made to it - I wonder if
> there's any reason not to do them:
>
> 1. Insert can use heap[1] directly instead of calling top(). After all,
> this is done in an if() that already ensures that size>0.
>
> 2. Regardless, top() could return heap[1] always, without any if(). After
> all, the heap array is initialized to all nulls, so when size==0,
> heap[1]
> is null anyway.
>
>
>
> > PriorityQueue change (added insertWithOverflow method)
> >
> -----------------------------------------------------------------------------------
> > /**
> > * insertWithOverflow() is similar to insert(), except its return
> value:
> > it
> > * returns the object (if any) that was dropped off the heap because
> it
> > was
> > * full. This can be the given parameter (in case it is smaller than
> the
> > * full heap's minimum, and couldn't be added) or another object
> that
> > was
> > * previously the smallest value in the heap and now has been
> replaced
> > by a
> > * larger one.
> > */
> > public Object insertWithOverflow(Object element) {
> > if (size < maxSize) {
> > put(element);
> > return null;
> > } else if (size > 0 && !lessThan(element, top())) {
> > Object ret = heap[1];
> > heap[1] = element;
> > adjustTop();
> > return ret;
> > } else {
> > return element;
> > }
> > }
> > [Very similar to insert(), only it returns the object that was kicked
> out of
> > the Queue, or null]
>
> --
> Nadav Har'El | Tuesday, Dec 11 2007, 3 Tevet
> 5768
> IBM Haifa Research Lab
> |-----------------------------------------
> |A professor is one who talks in
> someone
> http://nadav.harel.org.il |else's sleep.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>