Ok, I finally finished the classes that allow easy traversal of Terms (base terms or actual terms) of a Query. Took longer than I expected, not because it's hard to collect terms, but because it was bit tricky to make it both intuitive to use and powerful to work in most cases.
The result is bit heavy in that I had to add a few new classes, but changes to existing classes were fairly minimal. And I think this should solve one of problems in writing highlighters. I'd love to get feedback on implementation, plus of course if/when bugs are found I need to fix them. Anyway, there are basically 3 ways to access Terms of a Query; - Use SimpleTermCollector's collectBaseTerms(); this will fill a Collection with all base Terms (unexpanded Terms; wildcard query terms still contains "*" and "?" etc). - Use ActualTermCollector's collectActualTerms() (need to pass an IndexReader); works like collectBaseTerms() but contains all actual terms (Terms expanded to Terms found in Index access using passed in IndexReader) - Get a TermQueryIterator using ActualTermCollector's termQueryIterator() method. In simplest cases first 2 methods are enough. However, if more information about Term context and type is needed, Iterator gives full access to most info you might want to know (you can check Query Term was contained in, whether Query is required/prohibited/optional). I didn't yet add full test cases, but ActualTermCollector has main() method that does simple testing given user's input. It also shows how to traverse Query Terms using TermQueryIterator (and base/actual term iterators it can give). -+ Tatu +- ps. About attachments; zip file contains new classes contained in org.apache.lucene.search package, txt file contains patches taken from org/apache/lucene/.
? search/ActualTermCollector.java
? search/BaseTermCollector.java
? search/SimpleTermCollector.java
? search/TermIterator.java
? search/TermIterators.java
? search/TermQueries.java
? search/TermQueryIterator.java
Index: search/BooleanClause.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/BooleanClause.java,v
retrieving revision 1.5
diff -u -r1.5 BooleanClause.java
--- search/BooleanClause.java 13 Jan 2003 23:50:33 -0000 1.5
+++ search/BooleanClause.java 14 Mar 2003 06:20:41 -0000
@@ -83,6 +83,10 @@
&& (this.prohibited == other.prohibited);
}
+ public Query getQuery() {
+ return query;
+ }
+
/** Returns a hash code value for this object.*/
public int hashCode() {
return query.hashCode() ^ (this.required?1:0) ^ (this.prohibited?2:0);
Index: search/BooleanQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/BooleanQuery.java,v
retrieving revision 1.14
diff -u -r1.14 BooleanQuery.java
--- search/BooleanQuery.java 7 Feb 2003 18:45:15 -0000 1.14
+++ search/BooleanQuery.java 14 Mar 2003 06:20:41 -0000
@@ -228,6 +228,30 @@
return this; // no clauses rewrote
}
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in. Will be modified
+ * by this query, based on BooleanClause settings.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ /* Need to traverse contained queries (in order they were added),
+ * and call their collectBaseTerms method
+ */
+ for (int i = 0 ; i < clauses.size(); i++) {
+ BooleanClause c = (BooleanClause)clauses.elementAt(i);
+ flags &= ~(F_REQUIRED | F_PROHIBITED);
+ if (c.required) {
+ flags |= F_REQUIRED;
+ } else if (c.prohibited) {
+ flags |= F_PROHIBITED;
+ } else {
+ ; // No flags to add
+ }
+ c.getQuery().collectBaseTerms(collector, flags);
+ }
+ }
public Object clone() {
BooleanQuery clone = (BooleanQuery)super.clone();
@@ -238,7 +262,7 @@
/** Prints a user-readable version of this query. */
public String toString(String field) {
StringBuffer buffer = new StringBuffer();
- if (getBoost() != 1.0) {
+ if (hasBoost()) {
buffer.append("(");
}
Index: search/FuzzyQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/FuzzyQuery.java,v
retrieving revision 1.3
diff -u -r1.3 FuzzyQuery.java
--- search/FuzzyQuery.java 13 Jan 2003 23:50:33 -0000 1.3
+++ search/FuzzyQuery.java 14 Mar 2003 06:20:41 -0000
@@ -68,6 +68,16 @@
return new FuzzyTermEnum(reader, getTerm());
}
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ collector.collectBaseFuzzyTerm(this, getTerm(), flags);
+ }
+
public String toString(String field) {
return super.toString(field) + '~';
}
Index: search/MultiTermQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/MultiTermQuery.java,v
retrieving revision 1.9
diff -u -r1.9 MultiTermQuery.java
--- search/MultiTermQuery.java 29 Jan 2003 17:18:54 -0000 1.9
+++ search/MultiTermQuery.java 14 Mar 2003 06:20:42 -0000
@@ -124,4 +124,10 @@
}
return buffer.toString();
}
+
+ /**
+ * This method can not yet be implemented by this intermediate
+ * class, as it needs to call type-dependant method in collector.
+ */
+ public abstract void collectBaseTerms(BaseTermCollector collector, int flags);
}
Index: search/PhrasePrefixQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/PhrasePrefixQuery.java,v
retrieving revision 1.8
diff -u -r1.8 PhrasePrefixQuery.java
--- search/PhrasePrefixQuery.java 29 Jan 2003 17:18:54 -0000 1.8
+++ search/PhrasePrefixQuery.java 14 Mar 2003 06:20:42 -0000
@@ -248,6 +248,19 @@
return new PhrasePrefixWeight(searcher);
}
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ ArrayList termList = termArrays;
+ int len = termList.size();
+ Term[][] terms = (Term[][]) termList.toArray(new Term[len][]);
+ collector.collectBasePhrasePrefixTerms(this, terms, flags);
+ }
+
/** Prints a user-readable version of this query. */
public final String toString(String f) {
StringBuffer buffer = new StringBuffer();
Index: search/PhraseQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/PhraseQuery.java,v
retrieving revision 1.11
diff -u -r1.11 PhraseQuery.java
--- search/PhraseQuery.java 29 Jan 2003 17:18:55 -0000 1.11
+++ search/PhraseQuery.java 14 Mar 2003 06:20:42 -0000
@@ -240,6 +240,15 @@
return new PhraseWeight(searcher);
}
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ collector.collectBasePhraseTerms(this, getTerms(), flags);
+ }
/** Prints a user-readable version of this query. */
public String toString(String f) {
Index: search/PrefixQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/PrefixQuery.java,v
retrieving revision 1.6
diff -u -r1.6 PrefixQuery.java
--- search/PrefixQuery.java 29 Jan 2003 17:18:55 -0000 1.6
+++ search/PrefixQuery.java 14 Mar 2003 06:20:42 -0000
@@ -98,6 +98,20 @@
return Query.mergeBooleanQueries(queries);
}
+ public Term getTerm() {
+ return prefix;
+ }
+
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ collector.collectBasePrefixTerm(this, getTerm(), flags);
+ }
+
/** Prints a user-readable version of this query. */
public String toString(String field) {
StringBuffer buffer = new StringBuffer();
Index: search/Query.java
===================================================================
RCS file: /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/Query.java,v
retrieving revision 1.12
diff -u -r1.12 Query.java
--- search/Query.java 20 Jan 2003 18:40:19 -0000 1.12
+++ search/Query.java 14 Mar 2003 06:20:43 -0000
@@ -81,6 +81,14 @@
public abstract class Query implements java.io.Serializable, Cloneable {
private float boost = 1.0f; // query boost factor
+ /* Query property flags passed when (recursively) collecting Terms
+ * using a [EMAIL PROTECTED] BaseTermCollector}.
+ */
+ public final static int F_REQUIRED = 0x0001;
+ public final static int F_PROHIBITED = 0x0002;
+
+ public final static int F_NONE = 0x0000; // No flags
+
/** Sets the boost for this query clause to <code>b</code>. Documents
* matching this clause will (in addition to the normal weightings) have
* their score multiplied by <code>b</code>.
@@ -93,6 +101,14 @@
*/
public float getBoost() { return boost; }
+ /** Checks whether Query has an explicit boost, that is, something other
+ * than default boost of 1.0.
+ *
+ * @return True if query has explicit boost different from default
+ * boost value; false otherwise.
+ */
+ public boolean hasBoost() { return boost != 1.0f; }
+
/** Prints a query to a string, with <code>field</code> as the default field
* for terms. <p>The representation used is one that is readable by [EMAIL
PROTECTED]
* org.apache.lucene.queryParser.QueryParser QueryParser} (although, if the
@@ -171,4 +187,15 @@
throw new RuntimeException("Clone not supported: " + e.getMessage());
}
}
+
+ /** Method called to get a list of [EMAIL PROTECTED] Term}s in this Query and
+ * its subqueries (if any). Query is expected to do pre-traversal
+ * for all of Terms it contains, recursively if necessary.
+ *
+ * @param collector Term collector to hand Terms contained to
+ * @param flags Property flags of the Query as passed from containing
+ * Queries; things like whether Term(s) contained are required, prohibited
+ * or optional.
+ */
+ public abstract void collectBaseTerms(BaseTermCollector collector, int flags);
}
Index: search/RangeQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/RangeQuery.java,v
retrieving revision 1.7
diff -u -r1.7 RangeQuery.java
--- search/RangeQuery.java 29 Jan 2003 17:18:55 -0000 1.7
+++ search/RangeQuery.java 14 Mar 2003 06:20:43 -0000
@@ -88,6 +88,10 @@
this.inclusive = inclusive;
}
+ public Term getLowerTerm() { return lowerTerm; }
+ public Term getUpperTerm() { return upperTerm; }
+ public boolean isInclusive() { return inclusive; }
+
public Query rewrite(IndexReader reader) throws IOException {
BooleanQuery query = new BooleanQuery();
// if we have a lowerTerm, start there. otherwise, start at beginning
@@ -140,10 +144,21 @@
return Query.mergeBooleanQueries(queries);
}
- private String getField()
+ public String getField()
{
return (lowerTerm != null ? lowerTerm.field() : upperTerm.field());
}
+
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ collector.collectBaseRangeTerms(this, lowerTerm, upperTerm, flags);
+ }
+
/** Prints a user-readable version of this query. */
public String toString(String field)
Index: search/TermQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/TermQuery.java,v
retrieving revision 1.7
diff -u -r1.7 TermQuery.java
--- search/TermQuery.java 15 Jan 2003 19:25:04 -0000 1.7
+++ search/TermQuery.java 14 Mar 2003 06:20:43 -0000
@@ -171,6 +171,16 @@
return new TermWeight(searcher);
}
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ collector.collectBaseTerm(this, term, flags);
+ }
+
/** Prints a user-readable version of this query. */
public String toString(String field) {
StringBuffer buffer = new StringBuffer();
@@ -199,5 +209,4 @@
public int hashCode() {
return Float.floatToIntBits(getBoost()) ^ term.hashCode();
}
-
}
Index: search/WildcardQuery.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/WildcardQuery.java,v
retrieving revision 1.3
diff -u -r1.3 WildcardQuery.java
--- search/WildcardQuery.java 13 Jan 2003 23:50:33 -0000 1.3
+++ search/WildcardQuery.java 14 Mar 2003 06:20:43 -0000
@@ -68,4 +68,13 @@
return new WildcardTermEnum(reader, getTerm());
}
+ /** Implementation of [EMAIL PROTECTED] Query#collectBaseTerms}
+ *
+ * @param collector base term collector to pass terms to.
+ * @param flags Query/Term property flags passed in.
+ */
+ public void collectBaseTerms(BaseTermCollector collector, int flags)
+ {
+ collector.collectBaseWildcardTerm(this, getTerm(), flags);
+ }
}
querySrc.zip
Description: Zip archive
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
