Satoshi Iijima created HIVEMALL-208:
---------------------------------------
Summary: tokenize_ja failed to analyze certain Japanese strings
Key: HIVEMALL-208
URL: https://issues.apache.org/jira/browse/HIVEMALL-208
Project: Hivemall
Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Satoshi Iijima
tokenize_ja failed to analyze certain Japanese strings and outputed below error.
{panel}
java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(JapaneseTokenizer.java:1024)
at
org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(JapaneseTokenizer.java:873)
at
org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:474)
at
org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(JapaneseBaseFormFilter.java:50)
at
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
at
org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
at
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
at
org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:63)
at
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(KuromojiUDF.java:292)
at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(KuromojiUDF.java:117)
{panel}
This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.
Affected versions are not only v0.5.0 but also v0.4.2.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)