[ 
https://issues.apache.org/jira/browse/NUTCH-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2395:
-----------------------------------
    Fix Version/s: 2.4

> Cannot run job worker! - error while running multiple crawling jobs in 
> parallel
> -------------------------------------------------------------------------------
>
>                 Key: NUTCH-2395
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2395
>             Project: Nutch
>          Issue Type: Bug
>          Components: nutch server
>    Affects Versions: 2.3.1
>         Environment: Ubuntu 16.04 64-bit
> Oracle Java 8 64-bit
> Nutch 2.3.1 (standalone deployment)
> MongoDB 3.4
>            Reporter: Vyacheslav Pascarel
>            Priority: Major
>             Fix For: 2.4
>
>
> Cannot run job worker! - error while running multiple crawling jobs in 
> parallel
> Ubuntu 16.04 64-bit
> Oracle Java 8 64-bit
> Nutch 2.3.1 (standalone deployment)
> MongoDB 3.4
> My application is trying to execute multiple Nutch jobs in parallel using 
> Nutch REST services. The application injects a seed URL and then repeats 
> GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of times to emulated 
> continuous crawling (each step in the sequence is executed upon successful 
> competition of the previous step then the whole sequence is repeated again). 
> Here is a brief description of the jobs:
> * Number of parallel jobs: 7
> * Each job has unique crawl id and MongoDB collection
> * Seed URL for all jobs: http://www.cnn.com
> * Regex URL filters for all jobs: 
> ** *"-^.\{1000,\}$"* - exclude very long URLs
> ** *"+."* - include the rest
> The jobs are started as expected but at some point some of them fail with 
> "Cannot run job worker!" error. For more details see job status and 
> hadoop.log lines below.
> In debugger during crash I noticed that a single instance of 
> SelectorEntryComparator (definition is nested in GeneratorJob) is shared 
> across multiple reducer tasks. The class is inherited from 
> org.apache.hadoop.io.WritableComparator which has a few members unprotected 
> for concurrent usage. At some point multiple threads may access those members 
> in WritableComparator.compare call. I modified SelectorEntryComparator and it 
> seems solved the problem but I am not sure if the change is appropriate 
> and/or sufficient (covers GENERATE only?)
> Original code:
> {code:java}
> public static class SelectorEntryComparator extends WritableComparator {
>     public SelectorEntryComparator() {
>       super(SelectorEntry.class, true);
>     }
> }
> {code}
> Modified code:
> {code:java}
> public static class SelectorEntryComparator extends WritableComparator {
>     public SelectorEntryComparator() {
>       super(SelectorEntry.class, true);
>     }
>     
>     @Override
>     synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2, int 
> s2, int l2) {
>       return super.compare(b1, s1, l1, b2, s2, l2);
>     }    
> }
> {code}
> Example of failed job status:
> {code}
> {
> "id" : "parallel_0-65ff2f1b-382e-4eb2-a813-a0370b84d5b6-GENERATE-1961495833",
> "type" : "GENERATE",
> "confId" : "65ff2f1b-382e-4eb2-a813-a0370b84d5b6",
> "args" : { "topN" : "100" },
> "result" : null,
> "state" : "FAILED",
> "msg" : "ERROR: java.lang.RuntimeException: job failed: 
> name=[parallel_0]generate: 1498059912-1448058551, 
> jobid=job_local1142434549_0036",
> "crawlId" : "parallel_0"
> }
> {code}
> Lines from hadoop.log
> {code}
> 2017-06-21 11:45:13,021 WARN  mapred.LocalJobRunner - job_local1142434549_0036
> java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
>                 at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>                 at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.RuntimeException: java.io.EOFException
>                 at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
>                 at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
>                 at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
>                 at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
>                 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
>                 at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
>                 at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>                 at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>                 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>                 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>                 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>                 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>                 at java.io.DataInputStream.readFully(DataInputStream.java:197)
>                 at org.apache.hadoop.io.Text.readString(Text.java:466)
>                 at org.apache.hadoop.io.Text.readString(Text.java:457)
>                 at 
> org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
>                 at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
>                 ... 12 more
> 2017-06-21 11:45:13,058 WARN  mapred.LocalJobRunner - job_local1976432650_0038
> java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
>                 at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>                 at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.RuntimeException: java.io.EOFException
>                 at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
>                 at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1245)
>                 at 
> org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
>                 at 
> org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
>                 at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
>                 at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1575)
>                 at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
>                 at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>                 at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>                 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>                 at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>                 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>                 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>                 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>                 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>                 at java.io.DataInputStream.readByte(DataInputStream.java:267)
>                 at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>                 at 
> org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
>                 at org.apache.hadoop.io.Text.readString(Text.java:464)
>                 at org.apache.hadoop.io.Text.readString(Text.java:457)
>                 at 
> org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
>                 at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
>                 ... 15 more
> {code}
> {code}
> 2017-06-21 11:45:13,372 ERROR impl.JobWorker - Cannot run job worker!
> java.lang.RuntimeException: job failed: name=[parallel_0]generate: 
> 1498059912-1448058551, jobid=job_local1142434549_0036
>                 at 
> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
>                 at 
> org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
>                 at org.apache.nutch.api.impl.JobWorker.run(JobWorker.java:64)
>                 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>                 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>                 at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to