Hi all,

I would like to contribute to lucene with the update of QueryParser so
that it will allow the SuffixQueries.
I patched my lucene to allow *term in search strings, they are valid WildTerm.


I've seen the discussions on this thema and I understand why lucene developers don't to support suffix queries, but on our project is one of the requirements to support suffix queries, because they are very usefull for german language.
I would suggest to have the posibility to turn on/of this functionality.


My problem is that I am not familiar with JavaCC, can anyone help me to implement this?

 Thanks in advance,

  Sergiu


Sergiu,

I'm swamped for time myself as well as inexperienced with JavaCC other than minor tweaks. So your best bet is to ask questions on the lucene-user or -dev lists.

I suspect there is a more JavaCC-centric way to allow the grammar to be more dynamic than your suggestion.

Thanks for tackling this.

    Erik


On Feb 9, 2005, at 12:55 PM, sergiu gordea wrote:

Hi Erik,

I was proposing to update QueryParser to allow the construction of suffix queries.
I took a look in QueryParser.jj and the QueryParser.java and I figured out how to implement this functionality.


I want to add the field
boolean allowSuffixQueries = false;
and getter, setter methods.

The definition of the WILDTERM in QueryParser.jj must be changes from:
<WILDTERM: (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
to:
<WILDTERM: (<_TERM_START_CHAR> (<_TERM_CHAR> | ( [ "*", "?" ] ))* )
| ( [ "*", "?" ] <_TERM_START_CHAR> (<_TERM_CHAR> | ( [ "*", "?" ] ) )* ) >


( Maybe this is not the best definition, but we are using it since 6 months and we haven't had any problem with it)

This definition will allow suffix queries. In order to prevent them I would suggest to update the Clause method in the following way:

final public Query Clause(String field) throws ParseException {
Query q;
Token fieldToken=null, boost=null;
if (jj_2_1(2)) {
fieldToken = jj_consume_token(TERM);
jj_consume_token(COLON);
field=discardEscapeChar(fieldToken.image);
} else {
;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case QUOTED:
case TERM:
case PREFIXTERM:
case WILDTERM:
if(!allowSuffixQuery && isSuffixQuery(fieldToken.image))
throw new ParseException("suffix queries are not allowed! You must set allowSuffixQueries to true!");
case RANGEIN_START:
case RANGEEX_START:
case NUMBER:
q = Term(field);
break;
case LPAREN:
jj_consume_token(LPAREN);
q = Query(field);
jj_consume_token(RPAREN);
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case CARAT:
jj_consume_token(CARAT);
boost = jj_consume_token(NUMBER);
break;
default:
jj_la1[5] = jj_gen;
;
}
break;
default:
jj_la1[6] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
if (boost != null) {
float f = (float)1.0;
try {
f = Float.valueOf(boost.image).floatValue();
q.setBoost(f);
} catch (Exception ignored) { }
}
{if (true) return q;}
throw new Error("Missing return statement in function");
}


boolean isSuffixQuery(String s){
return s.startsWith("*") || s.startsWith("?");
}
I didn't found "*" and "?" to be defined as constants, they should be replaced.


As I am not very familiar with JavaCC I don't know how to apply this changes in the QueryParser.jj file.
Can you help me a little bit to apply these changes in the code please?
I will create then some Junit tests to check the behaviour...



Thanks in advance,

Sergiu





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to