[jira] Commented: (LUCENE-2911) synchronize grammar/token types across StandardTokenizer, UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.

Steven Rowe (JIRA) Tue, 08 Feb 2011 07:51:24 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992014#comment-12992014
 ]


Steven Rowe commented on LUCENE-2911:
-------------------------------------

The generated top-level domain macro file has a bunch of new entries when I run 
this, but these are not included in your patch, and I think we should keep this 
list up-to-date.

The patch is missing HangulSupp macro generation in 
modules/icu/src/tools/.../GenerateJFlexSupplementaryMacros.java, but since the 
Hangul macro is not used in the jflex grammar, this doesn't cause a problem.

It would be nice to remove the hard-coded ranges for the intersection of Hangul 
& ALetter, but when I tried to use JFlex negation and union to produce the 
equivalent, memory usage exploded and I couldn't get JFlex to generate, so I 
guess we'll have to wait on native JFlex supplementary character support before 
we can change it.


> synchronize grammar/token types across StandardTokenizer, 
> UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2911
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2911
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Analysis
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1
>
>         Attachments: LUCENE-2911.patch
>
>
> I'd like to do LUCENE-2906 (better cjk support for these tokenizers) for a 
> future target such as 3.2
> But, in 3.1 I would like to do a little cleanup first, and synchronize all 
> these token types, etc.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2911) synchronize grammar/token types across StandardTokenizer, UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.

Reply via email to