[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442827#comment-13442827 ]
Christian Johnsson edited comment on NUTCH-1448 at 8/28/12 11:48 AM: --------------------------------------------------------------------- Is this related to this problem im starting to get? Figure there are some bad input in hbase but i cant find it :-( 2012-08-28 01:48:10,871 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:98) at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:102) at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) 2012-08-28 01:48:10,875 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task was (Author: mr.johnsson): Is there a patch yet to fix this? Im starting to get failures couse of this sucker :-) 2012-08-28 01:48:10,871 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:98) at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:102) at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) 2012-08-28 01:48:10,875 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task > Redirected urls should be handled more cleanly (more like an outlink url) > ------------------------------------------------------------------------- > > Key: NUTCH-1448 > URL: https://issues.apache.org/jira/browse/NUTCH-1448 > Project: Nutch > Issue Type: Improvement > Reporter: Ferdy Galema > Fix For: 2.1 > > > This is specifically for Nutch2.x. Handling a redirects url like an outlink > is much more cleaner because this makes it more simple to trace how new urls > are added to the webpage database. Instant fetching of redirects won't work, > but this is a small price to pay. (Note that this currently does not work at > all, because the http.max.redirect property has no effect). Will be attaching > a patch in the upcoming days. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira