I originally just sent this to Mick, but he wanted be to also send it to the
list.
Hi, have been working on the parser for a bit now, and have some suggestions
that I would like you to comment on.
1)
The problem with parsing the () or any mismatched parentheses I have solved by
adding (<OPENP>|<CLOSEP>)* to the leaf clause that we have.
This also makes the qs = balance(qs, '(', ')'); redundant.
I think this is a better solution then to add the balance function back, and
to make another function that removes empty () or () containing only skip
characters, because it will accept more queries with non matching ()'s and it
will probably have better performance.
2)
I think the pre-prosessing of the query string is a bad thing, and that we
should try to fix how we set the parser up instead. So to make the qs =
even(qs, '\"'); redundant we can change this:
add "\"" to SKIP, and change this:
-TOKEN : { <QUOTED_WORD: "\"" (~[])* "\""> }
+TOKEN : { <QUOTED_WORD: "\"" (~["\""])+ "\""> }
3)
I removed this code:
- | <#WORD_SEPARATOR: [ // just a copy of the SKIP declaration. see SKIP
comment!
- " ", "!",
- "\u0023"-"\u0029",
- "\u003b"-"\u0040",
- "\u005b"-"\u0060",
- "\u007b"-"\u00bf",
- "\u00d7",
- "\u00f7",
- "\u2010"-"\u2015"
- ]>
since it is not used.
4)
When we do sub parsing, protect it and make a fallback if it fails.
This is for the sub parsing of the quoted words.
+ try { // if we can parse the content again, then make an
xorclause
+ final QueryParserImpl p = new
QueryParserImpl(createContext(term), QUOTED_WORD_DISABLED);
+ final Clause altClause = p.parse();
+ return context.createXorClause(phClause, altClause,
XorClause.Hint.PHRASE_ON_LEFT);
+
+ }
+ catch (ParseException e) {
+ LOG.warn("Parsing content of QUOTED_WORD: " + term, e);
+ }
+
+ return phClause;
5)
I added the token as a parameter to the enter function method, for better
debuging.
----
This was what I wanted to talk to you about.
Hope you are well, Håvard.
_______________________________________________
Kernel-development mailing list
[email protected]
http://sesat.no/mailman/listinfo/kernel-development