[ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772050#action_12772050 ]
DM Smith commented on LUCENE-2023: ---------------------------------- Robert, You have in BigramDictionary: {code} public boolean isToExist(int to) { return to < tokenPairListTable.length && tokenPairListTable[to] != null; } {code} And you call it in: {code} public void addSegTokenPair(SegTokenPair tokenPair) { final int to = tokenPair.to; if (!isToExist(to)) { ArrayList<SegTokenPair> newlist = new ArrayList<SegTokenPair>(); newlist.add(tokenPair); tokenPairListTable[to] = newlist; tableSize++; } else { List<SegTokenPair> tokenPairList = tokenPairListTable[to]; tokenPairList.add(tokenPair); } } {code} The check in addSegTokenPair assumes the isToExist(to) returns false when "to" is in bounds because "tokenPairListTable[to]" will throw an array bounds exception otherwise. Is it an invariant that tokenPair.to will always be in bounds? In the same way the array in SegGraph, does the same thing. With the former implementation, it did not have an issue. Other than that, it looks good. > Improve performance of SmartChineseAnalyzer > ------------------------------------------- > > Key: LUCENE-2023 > URL: https://issues.apache.org/jira/browse/LUCENE-2023 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-2023.patch > > > I've noticed SmartChineseAnalyzer is a bit slow, compared to say CJKAnalyzer > on chinese text. > This patch improves the internal hhmm implementation. > Time to index my chinese corpus is 75% of the previous time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org