[ 
https://issues.apache.org/jira/browse/NUTCH-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577288#comment-17577288
 ] 

Hudson commented on NUTCH-2936:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #78 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/78/])
NUTCH-2936 Early registration of URL stream handlers provided by plugins may 
fail Hadoop jobs running in distributed mode if protocol-okhttp is used 
(snagel: 
[https://github.com/apache/nutch/commit/03e0ffda4e0c7a31c033541e937a742fe798608a])
* (edit) 
src/plugin/protocol-okhttp/src/java/org/apache/nutch/protocol/okhttp/OkHttp.java
NUTCH-2936 Early registration of URL stream handlers provided by plugins may 
fail Hadoop jobs running in distributed mode if protocol-okhttp is used 
(snagel: 
[https://github.com/apache/nutch/commit/1f5f3e4d42b8dfb8bf741b11c9f39cc8bcd34091])
* (edit) src/java/org/apache/nutch/plugin/Extension.java
* (edit) src/java/org/apache/nutch/plugin/PluginRepository.java
* (edit) src/java/org/apache/nutch/plugin/Plugin.java
* (edit) src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java
NUTCH-2936 Early registration of URL stream handlers provided by plugins may 
fail Hadoop jobs (snagel: 
[https://github.com/apache/nutch/commit/487110b07a8b085c5546b58a2157268b3d21cb19])
* (edit) src/java/org/apache/nutch/plugin/PluginRepository.java
* (edit) src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java


> Early registration of URL stream handlers provided by plugins may fail Hadoop 
> jobs running in distributed mode if protocol-okhttp is used
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2936
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2936
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.19
>            Reporter: Sebastian Nagel
>            Assignee: Lewis John McGibbney
>            Priority: Blocker
>             Fix For: 1.19
>
>
> After merging NUTCH-2429 I've observed that Nutch jobs running in distributed 
> mode may fail early with the following dubious error:
> {noformat}
> 2022-01-14 13:11:45,751 ERROR crawl.DedupRedirectsJob: DeduplicationJob: 
> java.io.IOException: Error generating shuffle secret key
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:182)
>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1583)
>         at 
> org.apache.nutch.crawl.DedupRedirectsJob.run(DedupRedirectsJob.java:301)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at 
> org.apache.nutch.crawl.DedupRedirectsJob.main(DedupRedirectsJob.java:379)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.security.NoSuchAlgorithmException: HmacSHA1 KeyGenerator not 
> available
>         at java.base/javax.crypto.KeyGenerator.<init>(KeyGenerator.java:177)
>         at 
> java.base/javax.crypto.KeyGenerator.getInstance(KeyGenerator.java:244)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:179)
>         ... 16 more
> {noformat}
> After removing the early registration of URL stream handlers (see NUTCH-2429) 
> in NutchJob and NutchTool, the job starts without errors.
> Notes:
> - the job this error was observed a [custom de-duplication 
> job|https://github.com/commoncrawl/nutch/blob/cc/src/java/org/apache/nutch/crawl/DedupRedirectsJob.java]
>  to flag redirects pointing to the same target URL. But I'll try to reproduce 
> it with a standard Nutch job and in pseudo-distributed mode.
> - should also verify whether registering URL stream handlers works at all in 
> distributed mode. Tasks are launched differently, not as NutchJob or 
> NutchTool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to