[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446490#comment-13446490
]
Christian Johnsson edited comment on NUTCH-1461 at 9/1/12 10:42 AM:
--------------------------------------------------------------------
Quick fix incase there are some non valid domains in the database.
It will prevent it from crashing.
The best way around would be to totally ignore it and skip to the next entry
but my java skills are quite limited :-)
was (Author: mr.johnsson):
Quick fix incase there are some non valid domains in the database.
It will prevent it from crashing.
> Problem with TableUtil
> ----------------------
>
> Key: NUTCH-1461
> URL: https://issues.apache.org/jira/browse/NUTCH-1461
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: nutchgora
> Environment: Debian / CDH3 / Nutch 2.0 Release
> Reporter: Christian Johnsson
> Attachments: regex-urlfilter.txt, TabelUtil_Fix.patch
>
>
> Affects parse and updatedb and parse.
> Think i got some missformated urls into hbase but i can't fin them.
> It generates this error though. If i empty hbase and restart it goes for a
> couple of million pages indexed then it comes up again. Any tips on how to
> locate what row in the table that genereates this error?
> 2012-08-28 01:48:10,871 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:98)
> at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:102)
> at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> at org.apache.hadoop.mapred.Child.main(Child.java:260)
> 2012-08-28 01:48:10,875 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
> for the task
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira