RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

Itamar Syn-Hershko Wed, 12 May 2010 03:06:03 -0700

The QueryParser also fails to correctly parse Hebrew acronyms; although not
being an integral part of the current discussion, I thought this would be
the best place to bring that up.


Hebrew acronyms are assembled of letters with a single double-quote char
within, example: MNK"L (Hebrew for CEO). That double-quote char usually
comes at the before-last position of the word, but for some cases it can
come before (MNK"LIT). Since the QP expects two sets of double-quotes
enclosing a phrase, an exception will be thrown if such a word has been
passed to it, or an incorrect phrase query will be produced if two acronyms
are used together in a query string. Not sure which is worse.

Perhaps while you're at it you could make sure to only create a phrase query
if a quote is followed by a space - hence is definitely at the end of a
word, and not just assume it to be equivalent to a white space?

Although there's no good open Hebrew analyzer for Lucene yet hence no
motivation for this to be fixed, I'm working on one as we speak and
hopefully will have something to show in the next few weeks/days. It would
be nice to have at least this issue closed within the Lucene core code.

Thanks,

Itamar Syn-Hershko


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

Reply via email to