[ https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713437#comment-16713437 ]
Steve Rowe commented on LUCENE-8527: ------------------------------------ [~rcmuir ] mentioned on LUCENE-8125 that StandardTokenizer should give such sequences the {{<EMOJI>}} token type - see the logic in the {{icu}} module's {{BreakIteratorWrapper}}. JFlex 1.7.0 supports Unicode 9.0, which, if I'm interpreting the discussion at http://www.unicode.org/L2/L2016/16315r-handling-seg-emoji.pdf properly, does not (fully) include Emoji sequence support (though customized rules that would do that properly in Unicode 9.0 are listed in that doc). Should we include the (post-9.0) customized rules for Unicode 9.0? > Upgrade JFlex to 1.7.0 > ---------------------- > > Key: LUCENE-8527 > URL: https://issues.apache.org/jira/browse/LUCENE-8527 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build, modules/analysis > Reporter: Steve Rowe > Priority: Minor > > JFlex 1.7.0, supporting Unicode 9.0, was released recently: > [http://jflex.de/changelog.html#jflex-1.7.0]. We should upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org