I originally just sent this to Mick, but he wanted be to also send it to the 
list.

Hi, have been working on the parser for a bit now, and have some suggestions 
that I would like you to comment on. 

1)

The problem with parsing the () or any mismatched parentheses I have solved by 
adding (<OPENP>|<CLOSEP>)* to the leaf clause that we have.

This also makes the qs = balance(qs, '(', ')'); redundant.

I think this is a better solution then to add the balance function back, and 
to make another function that removes empty () or () containing only skip 
characters, because it will accept more queries with non matching ()'s and it 
will probably have better performance.

2)

I think the pre-prosessing of the query string is a bad thing, and that we 
should try to fix how we set the parser up instead. So to make the qs = 
even(qs, '\"'); redundant we can change this:

add "\"" to SKIP, and change this:
-TOKEN : { <QUOTED_WORD: "\"" (~[])* "\""> }
+TOKEN : { <QUOTED_WORD: "\"" (~["\""])+ "\""> }

3)

I removed this code:

-    | <#WORD_SEPARATOR: [ // just a copy of the SKIP declaration. see SKIP 
comment!
-            " ", "!",
-            "\u0023"-"\u0029",
-            "\u003b"-"\u0040",
-            "\u005b"-"\u0060",
-            "\u007b"-"\u00bf",
-            "\u00d7",
-            "\u00f7",
-            "\u2010"-"\u2015"
-        ]>

since it is not used.

4)

When we do sub parsing, protect it and make a fallback if it fails.

This is for the sub parsing of the quoted words.

+                try { // if we can parse the content again, then make an 
xorclause
+                    final QueryParserImpl p = new 
QueryParserImpl(createContext(term), QUOTED_WORD_DISABLED);
+                    final Clause altClause = p.parse();
+                    return context.createXorClause(phClause, altClause, 
XorClause.Hint.PHRASE_ON_LEFT);
+
+                }
+                catch (ParseException e) {
+                    LOG.warn("Parsing content of QUOTED_WORD: " + term, e);
+                }
+
+                return phClause;

5)

I added the token as a parameter to the enter function method, for better 
debuging.

----

This was what I wanted to talk to you about.

Hope you are well, Håvard.

_______________________________________________
Kernel-development mailing list
[email protected]
http://sesat.no/mailman/listinfo/kernel-development

Reply via email to