Thank you for your response. I have to update my code [?]^_^
在 2011年4月13日 下午7:19,Julien Nioche <[email protected]>写道: > Hi, > > Nutch has moved away from handling the indexing and search itself and now > delegates that to SOLR as of versions 1.3 and 2.0 (both forthcoming). The > issue you described won't be fixed as this part of the code has been > removed. Users are encouraged to start using 1.3 and use SOLR for the > indexing and search. > > Your comments should be useful to anyone having the same issue with Nutch > <= 1.2, so thanks for sharing this. > > Julien > > > 2011/4/13 Bupo Jung <[email protected]> > >> I use Nutch for Chinese search. I input a query string like >> "可爱的小女生"(a lovely little girl),the chinese analyzer turn it to three query >> token―― >> 可爱、小女、女生. When using the tokens to get the summary of the result page, a >> StringIndexOutOfBoundsException throw out. Here is the error log: >> >> 2010-12-15 12:18:43,505 ERROR searcher.NutchBean �C Exception occured while >> executing search: java.lang.RuntimeException: >> java.util.concurrent.ExecutionException: >> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 >> >> java.lang.RuntimeException: java.util.concurrent.ExecutionException: >> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 >> >> at >> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:297) >> >> at org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:350) >> >> at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:410) >> >> Caused by: java.util.concurrent.ExecutionException: >> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 >> >> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) >> >> at java.util.concurrent.FutureTask.get(FutureTask.java:83) >> >> at >> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:292) >> >> … 2 more >> >> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of >> range: -1 >> >> at java.lang.String.substring(String.java:1937) >> >> at >> org.apache.nutch.summary.basic.BasicSummarizer.getSummary(BasicSummarizer.java:188) >> >> at >> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:263) >> >> at >> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63) >> >> at >> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53) >> >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >> at java.lang.Thread.run(Thread.java:662) >> >> This is because there is overlap between the two query tokens “小女” and >> “女生”。 >> >> >> nutch/src/plugin/summary-basic/src/java/org/apache/nutch/summary/basic/BasicSummarizer.java >> >> line 188: >> >> *if* (highlight.contains(t.term())) { >> excerpt.addToken(t.term()); >> //when two tokens overlap,offset>t.startOffset() >> // >> excerpt.add(*new*Fragment(text.substring(offset, t.startOffset())));//this >> is where the exception accur >> excerpt.add(*new* >> Highlight(text.substring(t.startOffset(),t.endOffset()))); >> offset = t.endOffset(); >> endToken = Math.*min*(j +sumContext, tokens.length); >> } >> >> >> //Change code to fix the error: >> *if* (highlight.contains(t.term())) { >> excerpt.addToken(t.term()); >> //bupo changed the code to fix the chinese token overlap error 2010.12.15 >> *if*(offset < t.startOffset()){ >> excerpt.add(*new*Fragment(text.substring(offset, t.startOffset()))); >> excerpt.add(*new* >> Highlight(text.substring(t.startOffset(),t.endOffset()))); >> }*else*{ >> excerpt.add(*new*Highlight(text.substring(offset,t.endOffset()))); >> }//bupo >> } >> >> -- >> >> Yizhong Zhuang >> Beijing University of Posts and Telecommunications >> Email:[email protected] >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > -- Yizhong Zhuang Beijing University of Posts and Telecommunications Email:[email protected]
<<341.gif>>

