[ https://issues.apache.org/jira/browse/LUCENE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978156#comment-14978156 ]
Christian Moen commented on LUCENE-6837: ---------------------------------------- Thanks a lot for this, Konno-san. Very nice work! I like the idea to calculate the n-best cost using examples. Since search mode and also extended mode solves a similar problem, I'm wondering if it makes sense to introduce n-best as a separate mode in itself. In your experience in developing the feature, do you think it makes a lot of sense to use it with search and extended mode? I think I'm in favour of supporting it for all the modes, even though it perhaps makes the most sense for normal mode. The reason for this is to make sure that the entire API for {{JapaneseTokenizer}} is functional for all the tokenizer modes. I'll add a few tests and I'd like to commit this soon. > Add N-best output capability to JapaneseTokenizer > ------------------------------------------------- > > Key: LUCENE-6837 > URL: https://issues.apache.org/jira/browse/LUCENE-6837 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 5.3 > Reporter: KONNO, Hiroharu > Priority: Minor > Attachments: LUCENE-6837.patch > > > Japanese morphological analyzers often generate mis-segmented tokens. N-best > output reduces the impact of mis-segmentation on search result. N-best output > is more meaningful than character N-gram, and it increases hit count too. > If you use N-best output, you can get decompounded tokens (ex: > "シニアソフトウェアエンジニア" => {"シニア", "シニアソフトウェアエンジニア", "ソフトウェア", "エンジニア"}) and > overwrapped tokens (ex: "数学部長谷川" => {"数学", "部", "部長", "長谷川", "谷川"}), > depending on the dictionary and N-best parameter settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org