[
https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316744#comment-17316744
]
Robert Muir commented on LUCENE-9914:
-------------------------------------
This seems to print the current stuff that is needed... or its at least close?
{code}
import com.ibm.icu.text.UnicodeSet;
public class Generate {
public static void main(String args[]) throws Exception {
String sets[] = new String[] { "Emoji", "Emoji_Modifier",
"Emoji_Modifier_Base", "Extended_Pictographic" };
for (String setName : sets) {
UnicodeSet set = new UnicodeSet("[:" + setName + ":]");
System.out.print(setName + " = [");
for (UnicodeSet.EntryRange range : set.ranges()) {
if (range.codepoint == range.codepointEnd) {
System.out.print("\\u{" + Integer.toHexString(range.codepoint) + "}");
} else {
System.out.print("\\u{" + Integer.toHexString(range.codepoint) +
"}-\\u{" + Integer.toHexString(range.codepointEnd) + "}");
}
}
System.out.println("]");
}
}
}
{code}
> Modernize Emoji regeneration scripts
> ------------------------------------
>
> Key: LUCENE-9914
> URL: https://issues.apache.org/jira/browse/LUCENE-9914
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
>
> These are perl scripts... I don't think they had ant tasks in 8x and they
> haven't been used in a while. They don't seem too scary (for perl) - just
> fetch emoji unicode descriptions and parse them into a jflex macro and a test
> case.
> It'd be good to convert them to use python, groovy or even java so that they
> fit better in the build system. Alternatively - perhaps there is a way to get
> these codepoint properties from Java directly?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]