[jira] [Commented] (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

Tom Burton-West (JIRA) Wed, 31 Aug 2011 14:12:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094902#comment-13094902
 ]


Tom Burton-West commented on LUCENE-2906:
-----------------------------------------

Any chance this might get implemented for 3.4?


> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-2906
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2906
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Tom Burton-West
>            Priority: Minor
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

Reply via email to