[
https://issues.apache.org/jira/browse/NUTCH-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2395:
-----------------------------------
Fix Version/s: 2.4
> Cannot run job worker! - error while running multiple crawling jobs in
> parallel
> -------------------------------------------------------------------------------
>
> Key: NUTCH-2395
> URL: https://issues.apache.org/jira/browse/NUTCH-2395
> Project: Nutch
> Issue Type: Bug
> Components: nutch server
> Affects Versions: 2.3.1
> Environment: Ubuntu 16.04 64-bit
> Oracle Java 8 64-bit
> Nutch 2.3.1 (standalone deployment)
> MongoDB 3.4
> Reporter: Vyacheslav Pascarel
> Priority: Major
> Fix For: 2.4
>
>
> Cannot run job worker! - error while running multiple crawling jobs in
> parallel
> Ubuntu 16.04 64-bit
> Oracle Java 8 64-bit
> Nutch 2.3.1 (standalone deployment)
> MongoDB 3.4
> My application is trying to execute multiple Nutch jobs in parallel using
> Nutch REST services. The application injects a seed URL and then repeats
> GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of times to emulated
> continuous crawling (each step in the sequence is executed upon successful
> competition of the previous step then the whole sequence is repeated again).
> Here is a brief description of the jobs:
> * Number of parallel jobs: 7
> * Each job has unique crawl id and MongoDB collection
> * Seed URL for all jobs: http://www.cnn.com
> * Regex URL filters for all jobs:
> ** *"-^.\{1000,\}$"* - exclude very long URLs
> ** *"+."* - include the rest
> The jobs are started as expected but at some point some of them fail with
> "Cannot run job worker!" error. For more details see job status and
> hadoop.log lines below.
> In debugger during crash I noticed that a single instance of
> SelectorEntryComparator (definition is nested in GeneratorJob) is shared
> across multiple reducer tasks. The class is inherited from
> org.apache.hadoop.io.WritableComparator which has a few members unprotected
> for concurrent usage. At some point multiple threads may access those members
> in WritableComparator.compare call. I modified SelectorEntryComparator and it
> seems solved the problem but I am not sure if the change is appropriate
> and/or sufficient (covers GENERATE only?)
> Original code:
> {code:java}
> public static class SelectorEntryComparator extends WritableComparator {
> public SelectorEntryComparator() {
> super(SelectorEntry.class, true);
> }
> }
> {code}
> Modified code:
> {code:java}
> public static class SelectorEntryComparator extends WritableComparator {
> public SelectorEntryComparator() {
> super(SelectorEntry.class, true);
> }
>
> @Override
> synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2, int
> s2, int l2) {
> return super.compare(b1, s1, l1, b2, s2, l2);
> }
> }
> {code}
> Example of failed job status:
> {code}
> {
> "id" : "parallel_0-65ff2f1b-382e-4eb2-a813-a0370b84d5b6-GENERATE-1961495833",
> "type" : "GENERATE",
> "confId" : "65ff2f1b-382e-4eb2-a813-a0370b84d5b6",
> "args" : { "topN" : "100" },
> "result" : null,
> "state" : "FAILED",
> "msg" : "ERROR: java.lang.RuntimeException: job failed:
> name=[parallel_0]generate: 1498059912-1448058551,
> jobid=job_local1142434549_0036",
> "crawlId" : "parallel_0"
> }
> {code}
> Lines from hadoop.log
> {code}
> 2017-06-21 11:45:13,021 WARN mapred.LocalJobRunner - job_local1142434549_0036
> java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.RuntimeException: java.io.EOFException
> at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
> at
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
> at
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
> at
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
> at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at org.apache.hadoop.io.Text.readString(Text.java:466)
> at org.apache.hadoop.io.Text.readString(Text.java:457)
> at
> org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
> at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
> ... 12 more
> 2017-06-21 11:45:13,058 WARN mapred.LocalJobRunner - job_local1976432650_0038
> java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.RuntimeException: java.io.EOFException
> at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1245)
> at
> org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
> at
> org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
> at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1575)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at
> org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
> at org.apache.hadoop.io.Text.readString(Text.java:464)
> at org.apache.hadoop.io.Text.readString(Text.java:457)
> at
> org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
> at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
> ... 15 more
> {code}
> {code}
> 2017-06-21 11:45:13,372 ERROR impl.JobWorker - Cannot run job worker!
> java.lang.RuntimeException: job failed: name=[parallel_0]generate:
> 1498059912-1448058551, jobid=job_local1142434549_0036
> at
> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
> at
> org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
> at org.apache.nutch.api.impl.JobWorker.run(JobWorker.java:64)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)