[jira] [Commented] (LUCENE-9064) Can we remove the FST cache in Kuromoji and Nori analyzers?

Michael Sokolov (Jira) Mon, 25 Nov 2019 09:44:07 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981739#comment-16981739
 ]


Michael Sokolov commented on LUCENE-9064:
-----------------------------------------

[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia}. To get it 
running you must download jawiki from wikipedia and edit the test to point at 
the file you downloaded. You might also have to disable secutiry manager checks 
that prevent reading from random places in the filesystem.

> Can we remove the FST cache in Kuromoji and Nori analyzers?
> -----------------------------------------------------------
>
>                 Key: LUCENE-9064
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9064
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Priority: Minor
>
> Is the ~30k han cache in kuromoji redundant after LUCENE-8920?
> [https://github.com/apache/lucene-solr/blob/813ca77250db29116812bc949e2a466a70f969a3/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoFST.java#L35-L38])
> The entire linked file's purpose is all around this caching, so if its not 
> needed anymore it would be a nice cleanup. But it was definitely needed for 
> good performance before, so we shoudl be careful. Nori analyzer has the exact 
> same thing (file has the same name) for ~10k hangul syllables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-9064) Can we remove the FST cache in Kuromoji and Nori analyzers?

Reply via email to