Vyacheslav Pascarel created NUTCH-2395:
------------------------------------------

             Summary: Cannot run job worker! - error while running multiple 
crawling jobs in parallel
                 Key: NUTCH-2395
                 URL: https://issues.apache.org/jira/browse/NUTCH-2395
             Project: Nutch
          Issue Type: Bug
          Components: nutch server
    Affects Versions: 2.3.1
         Environment: Ubuntu 16.04 64-bit
Oracle Java 8 64-bit
Nutch 2.3.1 (standalone deployment)
MongoDB 3.4

            Reporter: Vyacheslav Pascarel


Cannot run job worker! - error while running multiple crawling jobs in parallel



Ubuntu 16.04 64-bit
Oracle Java 8 64-bit
Nutch 2.3.1 (standalone deployment)
MongoDB 3.4



My application is trying to execute multiple Nutch jobs in parallel using Nutch 
REST services. The application injects a seed URL and then repeats 
GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of times to emulated 
continuous crawling (each step in the sequence is executed upon successful 
competition of the previous step then the whole sequence is repeated again). 
Here is a brief description of the jobs:
* Number of parallel jobs: 7
* Each job has unique crawl id and MongoDB collection
* Seed URL for all jobs: http://www.cnn.com
* Regex URL filters for all jobs: 
** *"-^.\{1000,\}$"* - exclude very long URLs
** *"+."* - include the rest

The jobs are started as expected but at some point some of them fail with 
"Cannot run job worker!" error. For more details see job status and hadoop.log 
lines below.

In debugger during crash I noticed that a single instance of 
SelectorEntryComparator (definition is nested in GeneratorJob) is shared across 
multiple reducer tasks. The class is inherited from 
org.apache.hadoop.io.WritableComparator which has a few members unprotected for 
concurrent usage. At some point multiple threads may access those members in 
WritableComparator.compare call. I modified SelectorEntryComparator and it 
seems solved the problem but I am not sure if the change is appropriate and/or 
sufficient (covers GENERATE only?)

Original code:
{code:java}
public static class SelectorEntryComparator extends WritableComparator {
    public SelectorEntryComparator() {
      super(SelectorEntry.class, true);
    }
}
{code}

Modified code:
{code:java}
public static class SelectorEntryComparator extends WritableComparator {
    public SelectorEntryComparator() {
      super(SelectorEntry.class, true);
    }
    
    @Override
    synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2, int 
s2, int l2) {
        return super.compare(b1, s1, l1, b2, s2, l2);
    }    
}
{code}

Example of failed job status:
{code}
{
"id" : "parallel_0-65ff2f1b-382e-4eb2-a813-a0370b84d5b6-GENERATE-1961495833",
"type" : "GENERATE",
"confId" : "65ff2f1b-382e-4eb2-a813-a0370b84d5b6",
"args" : { "topN" : "100" },
"result" : null,
"state" : "FAILED",
"msg" : "ERROR: java.lang.RuntimeException: job failed: 
name=[parallel_0]generate: 1498059912-1448058551, 
jobid=job_local1142434549_0036",
"crawlId" : "parallel_0"
}
{code}

Lines from hadoop.log

{code}
2017-06-21 11:45:13,021 WARN  mapred.LocalJobRunner - job_local1142434549_0036
java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
                at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
                at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: java.io.EOFException
                at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
                at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
                at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
                at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
                at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
                at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
                at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
                at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
                at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
                at java.io.DataInputStream.readFully(DataInputStream.java:197)
                at org.apache.hadoop.io.Text.readString(Text.java:466)
                at org.apache.hadoop.io.Text.readString(Text.java:457)
                at 
org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
                at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
                ... 12 more
2017-06-21 11:45:13,058 WARN  mapred.LocalJobRunner - job_local1976432650_0038
java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
                at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
                at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.io.EOFException
                at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
                at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1245)
                at 
org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
                at 
org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
                at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
                at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1575)
                at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
                at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
                at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
                at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
                at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
                at java.io.DataInputStream.readByte(DataInputStream.java:267)
                at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
                at 
org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
                at org.apache.hadoop.io.Text.readString(Text.java:464)
                at org.apache.hadoop.io.Text.readString(Text.java:457)
                at 
org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
                at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
                ... 15 more
{code}

{code}
2017-06-21 11:45:13,372 ERROR impl.JobWorker - Cannot run job worker!
java.lang.RuntimeException: job failed: name=[parallel_0]generate: 
1498059912-1448058551, jobid=job_local1142434549_0036
                at 
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
                at 
org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
                at org.apache.nutch.api.impl.JobWorker.run(JobWorker.java:64)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to