[jira] [Comment Edited] (LUCENE-9064) Can we remove the FST cache in Kuromoji and Nori analyzers?

Michael Sokolov (Jira) Mon, 25 Nov 2019 09:44:08 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981739#comment-16981739
 ]


Michael Sokolov edited comment on LUCENE-9064 at 11/25/19 5:43 PM:
-------------------------------------------------------------------

[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia} (commented 
out). To get it running you must download jawiki from wikipedia and edit the 
test to point at the file you downloaded. You might also have to disable 
secutiry manager checks that prevent reading from random places in the 
filesystem.


was (Author: sokolov):
[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia}. To get it 
running you must download jawiki from wikipedia and edit the test to point at 
the file you downloaded. You might also have to disable secutiry manager checks 
that prevent reading from random places in the filesystem.

> Can we remove the FST cache in Kuromoji and Nori analyzers?
> -----------------------------------------------------------
>
>                 Key: LUCENE-9064
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9064
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Priority: Minor
>
> Is the ~30k han cache in kuromoji redundant after LUCENE-8920?
> [https://github.com/apache/lucene-solr/blob/813ca77250db29116812bc949e2a466a70f969a3/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoFST.java#L35-L38])
> The entire linked file's purpose is all around this caching, so if its not 
> needed anymore it would be a nice cleanup. But it was definitely needed for 
> good performance before, so we shoudl be careful. Nori analyzer has the exact 
> same thing (file has the same name) for ~10k hangul syllables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-9064) Can we remove the FST cache in Kuromoji and Nori analyzers?

Reply via email to