[
https://issues.apache.org/jira/browse/LUCENE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992014#comment-12992014
]
Steven Rowe commented on LUCENE-2911:
-------------------------------------
The generated top-level domain macro file has a bunch of new entries when I run
this, but these are not included in your patch, and I think we should keep this
list up-to-date.
The patch is missing HangulSupp macro generation in
modules/icu/src/tools/.../GenerateJFlexSupplementaryMacros.java, but since the
Hangul macro is not used in the jflex grammar, this doesn't cause a problem.
It would be nice to remove the hard-coded ranges for the intersection of Hangul
& ALetter, but when I tried to use JFlex negation and union to produce the
equivalent, memory usage exploded and I couldn't get JFlex to generate, so I
guess we'll have to wait on native JFlex supplementary character support before
we can change it.
> synchronize grammar/token types across StandardTokenizer,
> UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.
> --------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2911
> URL: https://issues.apache.org/jira/browse/LUCENE-2911
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: Analysis
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2911.patch
>
>
> I'd like to do LUCENE-2906 (better cjk support for these tokenizers) for a
> future target such as 3.2
> But, in 3.1 I would like to do a little cleanup first, and synchronize all
> these token types, etc.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]