[
https://issues.apache.org/jira/browse/LUCENE-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981739#comment-16981739
]
Michael Sokolov edited comment on LUCENE-9064 at 11/25/19 5:43 PM:
-------------------------------------------------------------------
[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia} (commented
out). To get it running you must download jawiki from wikipedia and edit the
test to point at the file you downloaded. You might also have to disable
secutiry manager checks that prevent reading from random places in the
filesystem.
was (Author: sokolov):
[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia}. To get it
running you must download jawiki from wikipedia and edit the test to point at
the file you downloaded. You might also have to disable secutiry manager checks
that prevent reading from random places in the filesystem.
> Can we remove the FST cache in Kuromoji and Nori analyzers?
> -----------------------------------------------------------
>
> Key: LUCENE-9064
> URL: https://issues.apache.org/jira/browse/LUCENE-9064
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Bruno Roustant
> Priority: Minor
>
> Is the ~30k han cache in kuromoji redundant after LUCENE-8920?
> [https://github.com/apache/lucene-solr/blob/813ca77250db29116812bc949e2a466a70f969a3/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoFST.java#L35-L38])
> The entire linked file's purpose is all around this caching, so if its not
> needed anymore it would be a nice cleanup. But it was definitely needed for
> good performance before, so we shoudl be careful. Nori analyzer has the exact
> same thing (file has the same name) for ~10k hangul syllables.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]