my $.02.
Before having patches, I think it's a good idea to agree on what the
"right" solution is. Most of it is obvious using boolean logic, but we
have some additional requirements like not having a query that only has
a NOT clause. Is this the only exception?
As far as the actual patch, I would suspect that a better approach than
using java would be to use precedence operations in the actual parser.
I've never used javacc, and it's been years since I've used yacc/bison,
but one of the basic capbilities in parsers is to define precedence. It
should be quite easy to fix it this way, and it should be more "bullet
proof." I looked a bit at the javacc code, but I don't really have the
time right now to analyze it. It certainly seems like the strategy of
having all the operators together is problematic:
<DEFAULT> TOKEN : {
<AND: ("AND" | "&&") >
| <OR: ("OR" | "||") >
| <NOT: ("NOT" | "!") >
| <PLUS: "+" >
| <MINUS: "-" >
| <LPAREN: "(" >
| <RPAREN: ")" >
| <COLON: ":" >
| <CARAT: "^" > : Boost
| <QUOTED: "\"" (~["\""])+ "\"">
| <TERM: <_TERM_START_CHAR> (<_TERM_CHAR>)* >
| <FUZZY: "~" >
| <SLOP: "~" (<_NUM_CHAR>)+ >
| <PREFIXTERM: <_TERM_START_CHAR> (<_TERM_CHAR>)* "*" >
| <WILDTERM: <_TERM_START_CHAR>
(<_TERM_CHAR> | ( [ "*", "?" ] ))* >
|
<RANGEIN_START: "[" > : RangeIn
|
<RANGEEX_START: "{" > : RangeEx
}
Something like http://www.lysator.liu.se/c/ANSI-C-grammar-y.html where
different operators are grouped differently according to precedence
would work better.
As is often the case, trying to *correctly* parse a string is not
trivial.
Regards,
Dror
On Sun, Dec 28, 2003 at 07:11:22PM -0500, Erik Hatcher wrote:
> Morus,
>
> I haven't had time to think through all of the issues and the patch you
> submitted, but I suggest that you go ahead and attach this to a
> Bugzilla issue so that it can be addressed more formally and avoid
> being lost in the mounds of e-mail we all get.
>
> Thanks,
> Erik
>
>
> On Dec 28, 2003, at 11:46 AM, Morus Walter wrote:
>
> >Morus Walter writes:
> >>
> >>I attached the patch (made against 1.3rc3 but working for 1.3final as
> >>well)
> >>and a test program.
> >
> >Seems the attachments got stripped...
> >
> >So once again:
> >
> >The patch:
> >
> >===File lucene/QueryParser.jj.patch===============
> >*** QueryParser.jj.org Mon Dec 22 11:47:30 2003
> >--- QueryParser.jj Mon Dec 22 13:20:57 2003
> >***************
> >*** 233,255 ****
> >
> > protected void addClause(Vector clauses, int conj, int mods, Query
> >q) {
> > boolean required, prohibited;
> >!
> >! // If this term is introduced by AND, make the preceding term
> >required,
> > // unless it's already prohibited
> > if (conj == CONJ_AND) {
> >! BooleanClause c = (BooleanClause)
> >clauses.elementAt(clauses.size()-1);
> >! if (!c.prohibited)
> >! c.required = true;
> >! }
> >!
> >! if (operator == DEFAULT_OPERATOR_AND && conj == CONJ_OR) {
> >! // If this term is introduced by OR, make the preceding term
> >optional,
> >! // unless it's prohibited (that means we leave -a OR b but +a
> >OR b-->a OR b)
> >! // notice if the input is a OR b, first term is parsed as
> >required; without
> >! // this modification a OR b would parsed as +a OR b
> >! BooleanClause c = (BooleanClause)
> >clauses.elementAt(clauses.size()-1);
> >! if (!c.prohibited)
> >! c.required = false;
> > }
> >
> > // We might have been passed a null query; the term might have
> >been
> >--- 233,249 ----
> >
> > protected void addClause(Vector clauses, int conj, int mods, Query
> >q) {
> > boolean required, prohibited;
> >! // System.out.println(conj+ " " + mods + " " +
> >q.toString("text"));
> >! // If this term is introduced by AND, check if the previous term
> >is the
> >! // first term in this or-group and make that term required,
> > // unless it's already prohibited
> > if (conj == CONJ_AND) {
> >! Vector clauses2 = (Vector)clauses.elementAt(clauses.size()-1);
> >! //if ( clauses2.size() == 1 ) {
> >! BooleanClause c = (BooleanClause)
> >clauses2.elementAt(clauses2.size()-1);
> >! if (!c.prohibited)
> >! c.required = true;
> >! //}
> > }
> >
> > // We might have been passed a null query; the term might have
> >been
> >***************
> >*** 257,277 ****
> > if (q == null)
> > return;
> >
> > if (operator == DEFAULT_OPERATOR_OR) {
> >- // We set REQUIRED if we're introduced by AND or +; PROHIBITED
> >if
> > // introduced by NOT or -; make sure not to set both.
> > prohibited = (mods == MOD_NOT);
> >! required = (mods == MOD_REQ);
> >! if (conj == CONJ_AND && !prohibited) {
> >! required = true;
> >! }
> >! } else {
> >! // We set PROHIBITED if we're introduced by NOT or -; We set
> >REQUIRED
> >! // if not PROHIBITED and not introduced by OR
> > prohibited = (mods == MOD_NOT);
> >! required = (!prohibited && conj != CONJ_OR);
> > }
> >! clauses.addElement(new BooleanClause(q, required, prohibited));
> > }
> >
> > /**
> >--- 251,279 ----
> > if (q == null)
> > return;
> >
> >+ // start new or-group if there's an explit or
> >+ if ( conj == CONJ_OR ) {
> >+ clauses.addElement(new Vector());
> >+ }
> >+
> > if (operator == DEFAULT_OPERATOR_OR) {
> > // introduced by NOT or -; make sure not to set both.
> > prohibited = (mods == MOD_NOT);
> >! // for explizit conjunctions: set required to true
> >! if ( conj == CONJ_AND ) {
> >! required = true;
> >! }
> >! else {
> >! // default OR -> required only when requested
> >! required = (mods == MOD_REQ);
> >! }
> >! } else { // operator == DEFAULT_OPERATOR_AND
> >! // We set PROHIBITED if we're introduced by NOT or -
> > prohibited = (mods == MOD_NOT);
> >! // always REQUIRED unless PROHIBITED
> >! required = (!prohibited);
> > }
> >! ((Vector)clauses.elementAt(clauses.size()-1)).addElement(new
> >BooleanClause(q, required, prohibited));
> > }
> >
> > /**
> >***************
> >*** 359,369 ****
> > */
> > protected Query getBooleanQuery(Vector clauses) throws
> >ParseException
> > {
> >! BooleanQuery query = new BooleanQuery();
> >! for (int i = 0; i < clauses.size(); i++) {
> >! query.add((BooleanClause)clauses.elementAt(i));
> >! }
> >! return query;
> > }
> >
> > /**
> >--- 361,389 ----
> > */
> > protected Query getBooleanQuery(Vector clauses) throws
> >ParseException
> > {
> >! BooleanQuery query = new BooleanQuery();
> >! if ( clauses.size() == 1 ) {
> >! clauses = (Vector)clauses.elementAt(0);
> >! for (int i = 0; i < clauses.size(); i++) {
> >! query.add((BooleanClause)clauses.elementAt(i));
> >! }
> >! }
> >! else {
> >! for ( int i = 0; i < clauses.size(); i++ ) {
> >! Vector clauses2 = (Vector)clauses.elementAt(i);
> >! if ( clauses2.size() == 1 &&
> >((BooleanClause)clauses2.elementAt(0)).prohibited == false ) {
> >! query.add(new
> >BooleanClause(((BooleanClause)clauses2.elementAt(0)).query, false,
> >false));
> >! }
> >! else if ( clauses2.size() >= 1 ) {
> >! BooleanQuery query2 = new BooleanQuery();
> >! for ( int j = 0; j < clauses2.size(); j++ ) {
> >! query2.add((BooleanClause)clauses2.elementAt(j));
> >! }
> >! query.add(new BooleanClause(query2, false, false));
> >! }
> >! }
> >! }
> >! return query;
> > }
> >
> > /**
> >***************
> >*** 551,556 ****
> >--- 571,577 ----
> > Query Query(String field) :
> > {
> > Vector clauses = new Vector();
> >+ clauses.addElement(new Vector());
> > Query q, firstQuery=null;
> > int conj, mods;
> > }
> >***************
> >*** 566,572 ****
> > { addClause(clauses, conj, mods, q); }
> > )*
> > {
> >! if (clauses.size() == 1 && firstQuery != null)
> > return firstQuery;
> > else {
> > return getBooleanQuery(clauses);
> >--- 587,593 ----
> > { addClause(clauses, conj, mods, q); }
> > )*
> > {
> >! if (clauses.size() == 1 &&
> >((Vector)clauses.elementAt(0)).size() == 1 && firstQuery != null)
> > return firstQuery;
> > else {
> > return getBooleanQuery(clauses);
> >============================================================
> >
> >and the test program:
> >
> >===File lucene/LuceneTest.java===============
> >import org.apache.lucene.document.*;
> >import org.apache.lucene.analysis.*;
> >import org.apache.lucene.analysis.standard.StandardAnalyzer;
> >import org.apache.lucene.index.*;
> >import org.apache.lucene.store.*;
> >import org.apache.lucene.search.*;
> >import org.apache.lucene.queryParser.QueryParser;
> >
> >class LuceneTest
> >{
> > static String[] docs = {
> > "a", "b", "c", "d",
> > "a b", "a c", "a d", "b c", "b d", "c d",
> > "a b c", "a b d", "a c d", "b c d",
> > "a b c d"
> > };
> >
> > static String[] queries = {
> > "a OR b AND c",
> > "(a OR b) AND c",
> > "a OR (b AND c)",
> > "a AND b",
> > "a AND b OR c AND d",
> > "(a AND b) OR (c AND d)",
> > "a AND (b OR c) AND d",
> > "((a AND b) OR c) AND d",
> > "a AND (b OR (c AND d))",
> > "a AND b AND c AND d",
> >
> > "a OR b AND NOT c",
> > "(a OR b) AND NOT c",
> > "a OR (b AND NOT c)",
> > "a AND NOT d",
> > "a AND NOT b OR c AND NOT d",
> > "(a AND NOT b) OR (c AND NOT d)",
> > "a AND NOT (b OR c) AND NOT d",
> > "((a AND NOT b) OR c) AND NOT d",
> > "a AND NOT (b OR (c AND NOT d))",
> > "a AND NOT b AND NOT c AND NOT d",
> >
> > "a OR NOT b",
> > "a OR NOT a",
> >
> > "a b",
> > "a b c",
> > "a b (c d e)",
> > "+a +b",
> > "a -b",
> > "a +b -c",
> > "+a b -c",
> > "+a -b c",
> > "a -b -c",
> > "-a b c",
> >
> > "a OR b c AND d",
> > "a OR b c",
> > "a AND b c",
> > "a OR b c OR d",
> > "a OR b c d OR e",
> > "a AND b c AND d",
> > "a AND b c d AND e"
> > };
> >
> > public static void main(String argv[]) throws Exception {
> > Directory dir = new RAMDirectory();
> > String[] stop = {};
> > Analyzer analyzer = new StandardAnalyzer(stop);
> >
> > IndexWriter writer = new IndexWriter(dir, analyzer, true);
> >
> > for ( int i=0; i < docs.length; i++ ) {
> > Document doc = new Document();
> > doc.add(Field.Text("text", docs[i]));
> > writer.addDocument(doc);
> > }
> > writer.close();
> >
> > Searcher searcher = new IndexSearcher(dir);
> > for ( int i=0; i < queries.length; i++ ) {
> > QueryParser parser = new QueryParser("text", analyzer);
> > parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
> >
> > Query [] query = new Query[4];
> >
> > query[0] = QueryParser.parse(queries[i], "text", analyzer);
> > query[1] = QueryParser.parse(query[0].toString("text"), "text",
> >analyzer);
> > query[2] = parser.parse(queries[i]);
> > query[3] = QueryParser.parse(query[2].toString("text"), "text",
> >analyzer);
> >
> > System.out.println(i + ": " + queries[i] + " ==> " +
> >query[0].toString("text") + " -> " + query[1].toString("text") + " / "
> >+ query[2].toString("text") + " -> " + query[3].toString("text"));
> > if ( argv.length > 0 && argv[0].equals("-q") ) {
> > for ( int k=0; k<4; k++ ) {
> > Hits hits = searcher.search(query[k]);
> > System.out.println(k + " " + query[k].toString("text") +
> > "\t" + hits.length() + " documents found");
> > for ( int j=0; j < hits.length(); j++ ) {
> > Document doc = hits.doc(j);
> > System.out.println("\t"+doc.get("text"));
> > }
> > }
> > }
> > }
> > }
> >}
> >============================================================
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]