[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713437#comment-16713437
 ] 

Steve Rowe commented on LUCENE-8527:
------------------------------------

[~rcmuir ] mentioned on LUCENE-8125 that StandardTokenizer should give such 
sequences the {{<EMOJI>}} token type - see the logic in the {{icu}} module's 
{{BreakIteratorWrapper}}.

JFlex 1.7.0 supports Unicode 9.0, which, if I'm interpreting the discussion at 
http://www.unicode.org/L2/L2016/16315r-handling-seg-emoji.pdf properly, does 
not (fully) include Emoji sequence support (though customized rules that would 
do that properly in Unicode 9.0 are listed in that doc).

Should we include the (post-9.0) customized rules for Unicode 9.0?


> Upgrade JFlex to 1.7.0
> ----------------------
>
>                 Key: LUCENE-8527
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8527
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: general/build, modules/analysis
>            Reporter: Steve Rowe
>            Priority: Minor
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to