[ https://issues.apache.org/jira/browse/HIVEMALL-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Makoto Yui reassigned HIVEMALL-208: ----------------------------------- Assignee: Makoto Yui > tokenize_ja failed to analyze certain Japanese strings > ------------------------------------------------------ > > Key: HIVEMALL-208 > URL: https://issues.apache.org/jira/browse/HIVEMALL-208 > Project: Hivemall > Issue Type: Bug > Affects Versions: 0.5.0 > Reporter: Satoshi Iijima > Assignee: Makoto Yui > Priority: Minor > Fix For: 0.5.2 > > > tokenize_ja failed to analyze certain Japanese strings and outputed below > error. > {panel} > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(JapaneseTokenizer.java:1024) > at > org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(JapaneseTokenizer.java:873) > at > org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:474) > at > org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(JapaneseBaseFormFilter.java:50) > at > org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51) > at > org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63) > at > org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51) > at > org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:63) > at > org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45) > at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(KuromojiUDF.java:292) > at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(KuromojiUDF.java:117) > {panel} > This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded. > Affected versions are not only v0.5.0 but also v0.4.2. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)