Text Search Parser added

Jens Hübel Tue, 05 Jul 2011 07:30:34 -0700

Hi, Chemistries


just a quick note. Yesterday I have checked-in the code for parsing text search 
queries in a CONTAINS statement. Please check your servers if it breaks 
something.

 

The text search parser is implemented as a completely separated parser and 
lexer in a separate grammar. Using it is optional. You can configure the parser 
in a way that you either get a CONTAINS string literal as before or a parsed 
tree. There are some new support methods helping with unescaping. The text 
search parser is integrated with our parsing framework for simpler query 
integration.

 

One component that needs review is the JCR connector. Integrating the parser 
breaks some tests so I changed the code to use the compatibility mode. In case 
the JCR connector can benefit I added a code template how to integrate the full 
text parser. This needs to be completed. In case this does not make sense for 
the JCR connector please remove my added code.

 

The InMemory server uses the full text parser and is able to do a (very 
simplistic) full text search now. It does not do any kind of preprocessing, so 
it makes only sense for plain text files. If you store HTML content and search 
for 'body' you will get a hit for every document. It does not use any kind of 
index generation, it uses a grep like search. Don't expect therefore great 
performance. Currently there is no ranking implemented. See the unit tests for 
details.

 

Jens

Text Search Parser added

Reply via email to