[ 
https://issues.apache.org/jira/browse/NUTCH-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362998#comment-17362998
 ] 

ASF GitHub Bot commented on NUTCH-2868:
---------------------------------------

sebastian-nagel merged pull request #649:
URL: https://github.com/apache/nutch/pull/649


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> urlnormalizer-protocol fails with StringIndexOutOfBoundsException when 
> reading invalid line in configuration file
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2868
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2868
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, urlnormalizer
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.19
>
>
> When reading a invalid line in the configuration file, the protocol 
> urlnormalizer may fail with a StringIndexOutOfBoundsException:
> {noformat}
> 2021-06-10 05:10:41,877 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -1
>         at java.lang.String.substring(String.java:1967)
>         at 
> org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer.readConfiguration(ProtocolURLNormalizer.java:95)
>         at 
> org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer.setConf(ProtocolURLNormalizer.java:182)
> {noformat}
> The invalid line should be logged and skipped without causing the job to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to