[ https://issues.apache.org/jira/browse/NUTCH-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939029#comment-17939029 ]
Sebastian Nagel commented on NUTCH-3109: ---------------------------------------- Hi [~markus], could you share more context? Java version, Hadoop version, etc. The exception looks very odd. I'm unable to reproduce the issue: - [ArrayIndexOutOfBoundsException|https://docs.oracle.com/javase/8/docs/api/java/lang/ArrayIndexOutOfBoundsException.html] inherits toString() from Throwable - tried the following with Java 8, 11, 17, 21: {code} int[] a = { }; try { System.out.println("" + a[0]); } catch (Exception e) { System.err.println( "Got: " + e ); } {code} - it just works - should use parameterized logging ({{LOG.warn("Skipping {}: ", url, e);}}), but that's not the solution, it would fail the same way > Unable to update CrawlDB due to URL normalization > ------------------------------------------------- > > Key: NUTCH-3109 > URL: https://issues.apache.org/jira/browse/NUTCH-3109 > Project: Nutch > Issue Type: Bug > Reporter: Markus Jelsma > Priority: Major > > I routinely added new normalization rules in a custom normalizer plugin, > nothing out of the ordinary. Updating the CrawlDB with -normalize just got me > this: > {code:java} > 2025-03-24 08:01:23,166 ERROR [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.AbstractMethodError: Receiver class > java.lang.ArrayIndexOutOfBoundsException does not define or inherit an > implementation of the resolved method 'java.lang.String toString()' of class > java.lang.Object. > at java.base/java.lang.String.valueOf(String.java:2951) > at java.base/java.lang.StringBuilder.append(StringBuilder.java:172) > at org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:101) > at org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:37) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:800) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) {code} > > The offending line of code is the LOG.warn below: > > {code:java} > if (url != null && urlNormalizers) { > try { > url = normalizers.normalize(url, scope); // normalize the url > } catch (Exception e) { > LOG.warn("Skipping " + url + ":" + e); > url = null; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)