This is a suggestion how to make code with PriorityQueue more consistent and easier to understand. I am not talking about a bug. The current code seems to work fine.
Currently there are two different kinds of applications of PriorityQueue. In SegmentMergeQueue, PhraseQueue, and TermPositionsQueue one knows in advance the maximum number of elements in the queue and the queue is only used to order the elements. One is never tempted to add more elements than the specified number to the queue. With HitQueue one wants to find the n best hits within an a priori unknown number of hits.
My problem:
1) PriorityQueue currently is initialized with a maximum size, however it actually can hold twice the number of elements. This feature is only used by HitQueue which currently must be able to hold one element more than maxSize. This is required by the code in IndexSearcher and MultiSearcher.
2) I find the code in IndexSearcher and MultiSearcher for controlling the size of the queue a little bit confusing. Furthermore, if the queue is already full and a new element has a higher score than top(), it is more efficient to change the top-element and call adjustTop() than to add the new element and call pop() afterwards (what is currently done).
Suggestion:
I�d like to introduce a clearer contract about the maxSize in PriorityQueue. PriorityQueue should be able to hold exactly maxSize elements and put() should throw an exception if one accidently tries to add more elements. Furthermore I�d like to add a method insert() to PriorityQueue that only adds a new element if either the queue is not full or !lessThan(element, top()). This new method can then be used in IndexSearcher and MultiSearcher.
Patch files for PriorityQueue, IndexSearcher, MultiSearcher, and TestPriorityQueue (one additional test) are attached.
kind regards, Christoph
-- ***************************************************************** * Dr. Christoph Goller Tel.: +49 89 203 45734 * * Detego Software GmbH Mobile: +49 179 1128469 * * Keuslinstr. 13 Fax.: +49 721 151516176 * * 80798 M�nchen, Germany Email: [EMAIL PROTECTED] * *****************************************************************
Index: PriorityQueue.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/util/PriorityQueue.java,v
retrieving revision 1.3
diff -u -r1.3 PriorityQueue.java
--- PriorityQueue.java 7 Nov 2002 05:55:40 -0000 1.3
+++ PriorityQueue.java 4 Sep 2003 13:29:40 -0000
@@ -60,6 +60,7 @@
public abstract class PriorityQueue {
private Object[] heap;
private int size;
+ private int maxSize;
/** Determines the ordering of objects in this priority queue. Subclasses
must define this one method. */
@@ -68,16 +69,41 @@
/** Subclass constructors must call this. */
protected final void initialize(int maxSize) {
size = 0;
- int heapSize = (maxSize * 2) + 1;
+ int heapSize = maxSize + 1;
heap = new Object[heapSize];
+ this.maxSize = maxSize;
}
- /** Adds an Object to a PriorityQueue in log(size) time. */
+ /**
+ * Adds an Object to a PriorityQueue in log(size) time.
+ * If one tries to add more objects than maxSize from initialize
+ * a RuntimeException (ArrayIndexOutOfBound) is thrown.
+ */
public final void put(Object element) {
size++;
heap[size] = element;
upHeap();
}
+
+ /**
+ * Adds element to the PriorityQueue in log(size) time if either
+ * the PriorityQueue is not full, or !lessThan(element, top()).
+ * @param element
+ * @return true if element is added, false otherwise.
+ */
+ public boolean insert(Object element){
+ if(size < maxSize){
+ put(element);
+ return true;
+ }
+ else if(size > 0 && !lessThan(element, top())){
+ heap[1] = element;
+ adjustTop();
+ return true;
+ }
+ else
+ return false;
+ }
/** Returns the least element of the PriorityQueue in constant time. */
public final Object top() {
Index: IndexSearcher.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/IndexSearcher.java,v
retrieving revision 1.9
diff -u -r1.9 IndexSearcher.java
--- IndexSearcher.java 24 May 2003 15:24:26 -0000 1.9
+++ IndexSearcher.java 4 Sep 2003 12:41:44 -0000
@@ -133,18 +133,11 @@
final HitQueue hq = new HitQueue(nDocs);
final int[] totalHits = new int[1];
scorer.score(new HitCollector() {
- private float minScore = 0.0f;
public final void collect(int doc, float score) {
if (score > 0.0f && // ignore zeroed buckets
(bits==null || bits.get(doc))) { // skip docs not in bits
totalHits[0]++;
- if (score >= minScore) {
- hq.put(new ScoreDoc(doc, score)); // update hit queue
- if (hq.size() > nDocs) { // if hit queue overfull
- hq.pop(); // remove lowest in hit queue
- minScore = ((ScoreDoc)hq.top()).score; // reset minScore
- }
- }
+ hq.insert(new ScoreDoc(doc, score));
}
}
}, reader.maxDoc());
Index: MultiSearcher.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/MultiSearcher.java,v
retrieving revision 1.10
diff -u -r1.10 MultiSearcher.java
--- MultiSearcher.java 29 Jan 2003 17:18:54 -0000 1.10
+++ MultiSearcher.java 4 Sep 2003 12:43:13 -0000
@@ -144,7 +144,6 @@
public TopDocs search(Query query, Filter filter, int nDocs)
throws IOException {
HitQueue hq = new HitQueue(nDocs);
- float minScore = 0.0f;
int totalHits = 0;
for (int i = 0; i < searchables.length; i++) { // search each searcher
@@ -153,15 +152,9 @@
ScoreDoc[] scoreDocs = docs.scoreDocs;
for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into hq
ScoreDoc scoreDoc = scoreDocs[j];
- if (scoreDoc.score >= minScore) {
- scoreDoc.doc += starts[i]; // convert doc
- hq.put(scoreDoc); // update hit queue
- if (hq.size() > nDocs) { // if hit queue overfull
- hq.pop(); // remove lowest in hit queue
- minScore = ((ScoreDoc)hq.top()).score; // reset minScore
- }
- } else
- break; // no more scores > minScore
+ scoreDoc.doc += starts[i]; // convert doc
+ if(!hq.insert(scoreDoc))
+ break; // no more scores >
minScore
}
}
Index: TestPriorityQueue.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/test/org/apache/lucene/util/TestPriorityQueue.java,v
retrieving revision 1.3
diff -u -r1.3 TestPriorityQueue.java
--- TestPriorityQueue.java 5 Jun 2002 01:50:54 -0000 1.3
+++ TestPriorityQueue.java 4 Sep 2003 13:39:34 -0000
@@ -135,4 +135,16 @@
pq.clear();
assertEquals(0, pq.size());
}
+
+ public void testFixedSize(){
+ PriorityQueue pq = new IntegerQueue(3);
+ pq.insert(new Integer(2));
+ pq.insert(new Integer(3));
+ pq.insert(new Integer(1));
+ pq.insert(new Integer(5));
+ pq.insert(new Integer(7));
+ pq.insert(new Integer(1));
+ assertEquals(3, pq.size());
+ assertEquals(3, ((Integer) pq.top()).intValue());
+ }
}--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
