[
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169481#comment-15169481
]
Uwe Schindler commented on LUCENE-6993:
---------------------------------------
bq. Uwe Schindler has written that he still recommends this tokenizer in some
cases, so if you're asking if we should remove it, I don't think so.
I think the question was if it should also be upgraded to newer Unicode. But it
does not rely on any unicode version the JAVA files should be identical. Please
don't remove it!
> Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all
> JFlex-based tokenizers to support Unicode 8.0
> ----------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Mike Drob
> Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch,
> LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the
> list of TLDs again. Comparing our old list with a new list indicates 800+ new
> domains, so it would be nice to include them.
> Also the JFlex tokenizer grammars should be upgraded to support Unicode 8.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]