Hi Mark
As said in a previous mail, I'm very interested in your Parser and I'm
happy to hear you made progress , and implemented
Paragraph/Sentence proximity search functionality. :)
This is the killer feature for me!
and if the execution of the resulting query ( a mix containing
SpanQuery 's) is not (much) slower than using Boolean/Pharse-Query
Combos, it would allow me to forget our current "1 lucene-doc per
paragraph" Indexing Model.
But also the other features are very cool, like the DateParsing which I
strongly miss in the standard QueryParser !
So let me hear, when you have a version ready to be tested, and how I
can help.
-Laurent
PS: Some other notes
Query-time thesaurus expansion / General token to query expansion :
Takes advantage of a general find/replace feature, "expand" might map
to "(expander | expanded)" ... or any other valid syntax.
This I could also use, if can also do following ?
right now I've a little utility class which expands special strings
(syntax is to be disc.) to all combinations :
"fest[,e] hypothek[,en,a]"
-> fest hypothek;fest hypotheken;fest hypotheka;feste hypothek;feste
hypotheken;feste hypotheka
Note that there may be some limitations...but so far this has proved
to be pretty powerful
Would still be good to know the limitations you see right now...
Mark Miller wrote:
I have finally delved back into the Lucene Query parser that I started
a few months back. I am very closing to wrapping up it's initial
development. I am currently looking for anybody willing to help me out
with a little testing and maybe some design consultation (I am not
happy with the current range query syntax for one). If you have any
interested in using this parser and have a little time to help out,
please do. The parser is extremely customizable and you can basically
mold it into whatever you want. A brief outline of the feature set:
The basics from Lucene query parser are covered: escaping operators,
handling tokens at the same position, range queries, etc.
Default Operators are: & | ! ~ ( )
New operators can be defined and default operators can be hidden on
the fly.
Adds a proximity operator to the standard AND, OR, and ANDNOT
operators allowing for queries like:
(search bear) ~5 (snake & horse ~4 pope) | crazy query
The default space operator is customizable and can be made to bind
tighter than if you use the actual operator (the operator acts like
the actual operator but within parenthesis).
The order of operations for the operators is customizable. The default
order is |, &, ~, !, ( )...you can change it to whatever you want.
Query-time thesaurus expansion / General token to query expansion :
Takes advantage of a general find/replace feature, "expand" might map
to "(expander | expanded)" ... or any other valid syntax. There is
also a slower RegEx feature so that you can match tokens with a
Pattern and perform back reference enabled replacements. You can also
make the replacement behave as an operator...you might map NEAR to ~10
, creating a new operator that performs within 10 word proximity
searches.
Did You Mean feature using the SpellCheck contrib: if you search for
'date(Aug 3, 1952) & mackine | rabbit' you might get a suggestion of :
'date(Aug 3, 1952) & machine | rabbit'
Paragraph/Sentence proximity search functionality. You can inject
tokens to specify paragraph and sentence markers and perform
SpanNotWithin searches for paragraph sentence proximity searches.
Customizable date parser.
Everything is pretty much configurable on the fly.
Note that there may be some limitations...but so far this has proved
to be pretty powerful. I could sure use some testing help making it
production ready though. I will be putting a new website up for the
parser soon. Please send me a note if you can help out at all. When I
put up the jar you can just run it with Java -jar and it will provide
a console input to enter queries and see the Lucene Query generated.
- Mark Miller
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]