[
https://issues.apache.org/jira/browse/LUCENE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842206#comment-13842206
]
Steve Rowe commented on LUCENE-5357:
------------------------------------
No problem Robert, thanks for taking a look.
About back-compat: none of the JFlex-based tokenizers on trunk have
version-based behavior at this point, in contrast to branch_4x. It could be
argued that that was because all previous back-compat version were for 3.X, but
this issue introduced a 4.0 version, which puts it within the version X-1
window for trunk/5.0. Should I forward-port the 4.0 back-compat stuff from
branch_4x for StandardTokenizer and UAX29URLEmailTokenizer? There are other
analysis components on trunk that do different things based on version, so
clearly the practice has not been abandoned on trunk.
> Upgrade StandardTokenizer & co to latest unicode rules
> ------------------------------------------------------
>
> Key: LUCENE-5357
> URL: https://issues.apache.org/jira/browse/LUCENE-5357
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Robert Muir
> Assignee: Steve Rowe
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5357.patch
>
>
> besides any change in data, the rules have also changed (regional indicators,
> better handling for hebrew, etc)
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]