Shai Erera wrote:

Hi,
I will open an issue and create the patch. One thing I'm not sure of is the wouldBeInserted method you mentioned - in what context should it be used?
And ... lessThan shouldn't be public, it can stay protected.

Sorry, this is a method Peter suggested (see below) in order to add de-duping logic on top of the PQ.

We should do it separately (it's unrelated).

Peter can you open an issue for that one?  Thanks!

Mike


On 12/11/07, Michael McCandless <[EMAIL PROTECTED]> wrote:


I think we can't make lessThan public since that would cause
subclasses to fail to compile (ie this breaks backwards compatibility)?

Adding "wouldBeInserted()" seems OK?

Mike

Peter Keegan wrote:

See my similar request from last March:
http://www.nabble.com/FieldSortedHitQueue-enhancement-
to9733550.html#a9733550

Peter

On Dec 11, 2007 11:54 AM, Nadav Har'El <[EMAIL PROTECTED]>
wrote:

On Mon, Dec 10, 2007, Shai Erera wrote about "Performance
Improvement for
Search using PriorityQueue":
Hi

Lucene's PQ implements two methods: put (assumes the PQ has room
for the
object) and insert (checks whether the object can be inserted
etc.). The
implementation of insert() requires the application that uses it to
allocate
a new object every time it calls insert. Specifically, it cannot
reuse
the
objects that were removed from the PQ.

I've read this entire thread, and would like to add my comments about
three
independent issues, which I think that can and perhaps should be
considered
separately:

1. When Shai wanted to add the insertWithOverflow() method to
PriorityQueue
  he couldn't just subclass PriorityQueue in his application, but
rather
  was forced to modify PriorityQueue itself. Why? just because one
field
  of that classi - "heap" - was defined "private" instead of
"protected".

Is there a special reason for that? If not, can we make the trivial
change
  to make PriorityQueue's fields protected, to allow Shai and
others (see
the
  next point) to add functionality on top of PriorityQueue?

2. PriorityQueue, in addition to being used in about a dozen
places inside
  Lucene, is also a public class that advanced users often find
useful
when
implementing features like new collectors, new queries, and so on.
  Unfortunately, my experience exactly matches Shai's: In the two
occasions
  where I used a PriorityQueue, I found that I needed such a
insertWithOverflow() method. If this feature is so useful (I can't
believe
that Shai and me are the only ones who found it useful), I think it
would
  be nice to add it to Lucene's PriorityQueue, even if it isn't
(yet) used
  inside Lucene.

  Just to make it sound more interesting, let me give you an
example where
  I needed (and implemented) an "insertWithOverflow()": I was
implementing
a
  faceted search capability over Lucene. It calculated a count for
each
  facet value, and then I used a PriorityQueue to find the 10 best
values.
The problem is that I also needed an "other" aggregator, which was
supposed
  to aggregate (in various ways) all the facets except the 10 best
ones.
For
that, I needed to know which facets dropped off the priorityqueue.

3. Finally, Shai asked for this new
PriorityQueue.insertWithOverflow()
  to be used in TopDocCollector. I have to admit I don't know how
much
of a benefit this will be in the "typical" case. But I do know that
  there's no such thing as a "typical" case...
  I can easily think of a quite typical "worst case" though:
Consider a
  collection indexed in order of document age (a pretty typical
scenario
  for a long-running index), and then you do a sorting query
(TopFieldDocCollector), asking it to bring the 10 newest documents.
  In that case, each and every document will have a new DocScore
created -
  which is the worst-case that Shai feared.
  It would be nice if Shai or someone else could provide a
measurement in
  that case.

P.S. When looking now at PriorityQueue's code, I found two tiny
performance improvements that could be easily made to it - I
wonder if
there's any reason not to do them:

 1. Insert can use heap[1] directly instead of calling top().
After all,
   this is done in an if() that already ensures that size>0.

 2. Regardless, top() could return heap[1] always, without any if
(). After
all, the heap array is initialized to all nulls, so when size==0,
heap[1]
   is null anyway.



PriorityQueue change (added insertWithOverflow method)

------------------------------------------------------------------- --
--------------
    /**
     * insertWithOverflow() is similar to insert(), except its
return
value:
it
     * returns the object (if any) that was dropped off the heap
because
it
was
     * full. This can be the given parameter (in case it is
smaller than
the
* full heap's minimum, and couldn't be added) or another object
that
was
     * previously the smallest value in the heap and now has been
replaced
by a
     * larger one.
     */
    public Object insertWithOverflow(Object element) {
        if (size < maxSize) {
            put(element);
            return null;
        } else if (size > 0 && !lessThan(element, top())) {
            Object ret = heap[1];
            heap[1] = element;
            adjustTop();
            return ret;
        } else {
            return element;
        }
    }
[Very similar to insert(), only it returns the object that was
kicked
out of
the Queue, or null]

--
Nadav Har'El                        |       Tuesday, Dec 11 2007,
3 Tevet
5768
IBM Haifa Research Lab
 |-----------------------------------------
                                   |A professor is one who talks in
someone
http://nadav.harel.org.il           |else's sleep.

------------------------------------------------------------------- --
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Regards,

Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to