[ 
https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317145#comment-17317145
 ] 

Robert Muir commented on LUCENE-9914:
-------------------------------------

yes, that one looks great: I think a similar groovy can work here (using above 
snippet). We just have to use icu 62 for now so that we get unicode 11 property 
data to match the version of unicode that jflex grammar uses (I think it only 
makes sense for the whole grammar to be self-consistent with respect to that, 
we shouldn't mix and match).

FYI, that one could be done in a similar more efficient way with UnicodeSet on 
the "White_Space" property as well, rather than looping thru every codepoint. 
But maybe it is fast enough that no one cares :)

> Modernize Emoji regeneration scripts
> ------------------------------------
>
>                 Key: LUCENE-9914
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9914
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>
> These are perl scripts... I don't think they had ant tasks in 8x and they 
> haven't been used in a while. They don't seem too scary (for perl) - just 
> fetch emoji unicode descriptions and parse them into a jflex macro and a test 
> case.
> It'd be good to convert them to use python, groovy or even java so that they 
> fit better in the build system. Alternatively - perhaps there is a way to get 
> these codepoint properties from Java directly?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to