[
https://issues.apache.org/jira/browse/LUCENE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746802#action_12746802
]
Michael McCandless commented on LUCENE-1787:
--------------------------------------------
The big challenge here is back compat. Ie, if we make this fix (which is a
good fix!), then users upgrade to 2.9, suddenly queries may stop hitting the
right documents because those documents had been indexed against the old
StandardAnalyzer that has this bug. Ie, the bug is "cached" in their index.
This is why we added "matchVersion" to StandardAnalyzer, but unfortunately we
don't yet have a clean means of carrying out matchVersion when changes to the
JFlex grammar are entailed.
> Standard Tokenizer doesn't recognise I.B.M as Acronym, it requires it ends
> with a dot i.e I.B.M.
> ------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1787
> URL: https://issues.apache.org/jira/browse/LUCENE-1787
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 2.9
> Reporter: Paul taylor
> Attachments: LUCENE-1787.patch
>
>
> Standard Tokenzizer doesn't recognise I.B.M it requires it end with a dot i.e
> I.B.M. This is particulary problematic if I.B.M is added tot the index, with
> the StandardAnalyser it will get added as IBM , a search for I.B.M will not
> match because I.B.M will be left as is, I would expect a match in this
> scenario
> I think it could be fixed by modifying the grammar ACRONYM_DEP in
> StandardTokenizerImpl.jflex so that it also supports
> {ALPHANUM} ("." {ALPHANUM})+
> dot only required between each character, (I'm not familiar with jflex syntax
> )
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]