[
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617048#comment-13617048
]
Hudson commented on NUTCH-1547:
-------------------------------
Integrated in Nutch-nutchgora #548 (See
[https://builds.apache.org/job/Nutch-nutchgora/548/])
NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision
1462079)
Result = SUCCESS
fenglu : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1462079
Files :
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/conf/nutch-default.xml
*
/nutch/branches/2.x/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
> BasicIndexingFilter - Problem to index full title
> -------------------------------------------------
>
> Key: NUTCH-1547
> URL: https://issues.apache.org/jira/browse/NUTCH-1547
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.6, 2.1
> Reporter: Gustavo Rauber
> Assignee: lufeng
> Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> I have faced this issue when trying to index the entire title, just like the
> content, configuring its value on nutch-default.xml to -1
> (indexer.max.title.length). I think the behavior should be the same as the
> content.
> If you would like to fix it, just replace the line number 90:
> if (title.length() > MAX_TITLE_LENGTH) { // truncate title if needed
> by this one:
> if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) { //
> truncate title if needed
> Stack Trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1937)
> at
> org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
> at
> org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
> at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
> at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Cheers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira