Hi, The system dictionary is not a mere "word collection", it includes a machine-learned language model which is carefully trained by researchers. If you want to replace the system dictionary, you have to start from "re-train" the model. This needs expert knowledge so I do not recommend to just modify the CSVs and rebuild it (if you do not have an expert about it).
As far as relates to "modern words" which is not included the current system dictionary, there are already a few options. 1. Use neologd dictionary (it's an extension of MeCab IPADIC, Kuromoji's default dictionary) For Solr: https://github.com/mocobeta/lucene-solr/tree/kuromoji-neologd_5_4_0 (The branch is mine. A little bit old, but you can cherry-pick the changes in the kuromoji's build.xml.) For Elasticsearch: https://github.com/codelibs/elasticsearch-analysis-kuromoji-ipadic-neologd 2. Use Sudachi dictionary For Elasticsearch: https://github.com/WorksApplications/elasticsearch-sudachi This includes Lucene jar, so I think you can extract the jar for Solr (I've never tried to use with Solr). Both are actively maintained by linguistics & NLP researchers/engineers. Please be careful, those are rather huge jars... Hope that helps. Tomoko 2019年5月26日(日) 23:11 Trejkaz <trej...@trypticon.org>: > > On Sun, 26 May 2019 at 23:49, Namgyu Kim <kng0...@gmail.com> wrote: > > > I think so about that approach. > > It's not user-friendly and it is not good for the user. > > I think it's better to get the parameters in > > JapaneseTokenizer. > > > > What do you think about this? > > > A way to override the system dictionary would be useful for us as well. We > often get people complaining that the current dictionary is missing a lot > of common modern words, and there are alternate mecab dictionaries sitting > around already which solve this problem. > > TX > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org