Re: parfor fails

Ethan Xu Sat, 16 Apr 2016 16:13:16 -0700

Hi Matthias,

Thank you very much for the explanation and a better solution. s =
colSums(x==0) is more concise and works great!


For experiment I tried the original parfor script with SystemML
configuration file provided. On my cluster it's still failing with "PARFOR:
Failed to execute loop in parallel". It looks like the failed MR jobs are
caused by

Caused by: org.apache.sysml.runtime.DMLRuntimeException: Failed to create
non-existing local working directory:/path.to/ethan.xu/tmp/systemml

That directory '/path.to/ethan.xu/tmp/systemml' exists on the local server,
and it subdirectories named '_p22748_127.0.0.1' etc. It looks like other
SystemML jobs had no trouble writing to it.

The stderr and one failed MR log is attached.

Thanks,

Ethan

On Thu, Apr 14, 2016 at 11:14 PM, Matthias Boehm <[email protected]> wrote:

> just for completeness, this issue is tracked with
> https://issues.apache.org/jira/browse/SYSTEMML-635 and the fix will be
> available tomorrow.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Matthias Boehm---04/14/2016 07:53:43
> PM---Hi Ethan, thanks for catching this issue. The parfor script]Matthias
> Boehm---04/14/2016 07:53:43 PM---Hi Ethan, thanks for catching this issue.
> The parfor script itself is perfectly fine
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: [email protected]
> Cc: "Ethan Xu" <[email protected]>
> Date: 04/14/2016 07:53 PM
> Subject: Re: parfor fails
> ------------------------------
>
>
>
> Hi Ethan,
>
> thanks for catching this issue. The parfor script itself is perfectly fine
> but you encountered an interesting runtime bug. Usually, you can find the
> actual cause at the bottom of the stacktrace or in previous exceptions. I
> was able to reproduce this issue if NO systemml config file is provided
> (fails on parsing this non-existing config in the parfor mr job task
> setup). So the workaround is to put a SystemML-config.xml into the same
> directory. Interestingly, the issue did not show up in our testsuite
> because we always specify a default configuration there (which was until
> recently mandatory).
>
> As a side note, we strongly recommend parfor over for loops here because
> it runs the entire loop in 1 instead of 2396 MR jobs due to automatic data
> partitioning. However, for the specific example at hand, a data-parallel
> formulation (with "s = colSums(x==0)") would be even better as it allows
> for partial aggregation and hence reduces shuffle.
>
> Regards,
> Matthias
>
> Ethan Xu ---04/14/2016 01:34:24 PM---Hello, I have a quick question. The
> following script fails with this error:
>
> From: Ethan Xu <[email protected]>
> To: [email protected]
> Date: 04/14/2016 01:34 PM
> Subject: parfor fails
> ------------------------------
>
>
>
> Hello,
>
> I have a quick question. The following script fails with this error:
>
> org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to execute
> loop in parallel.
>
> Here is the dml script:
>
> x=read($X);
>
> print("number of rows of x = " + nrow(x));
> print("number of cols of x = " + ncol(x));
>
> parfor(i in 1:ncol(x), check=0){
>   a = x[,i];
>   print("number of 0's in col " + i + " = " + sum(a == 0));
> }
>
> where X is a 35 million by 2396 matrix (coded and dummy coded numerical
> matrix) on HDFS. The script runs fine with regular 'for' loops.
>
> Could someone explain why this script cannot run in parallel? Was it a
> wrong way to code parfor?
>
> Thanks,
>
> Ethan
>
>
>
>

On Thu, Apr 14, 2016 at 11:14 PM, Matthias Boehm <[email protected]> wrote:

> just for completeness, this issue is tracked with
> https://issues.apache.org/jira/browse/SYSTEMML-635 and the fix will be
> available tomorrow.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Matthias Boehm---04/14/2016 07:53:43
> PM---Hi Ethan, thanks for catching this issue. The parfor script]Matthias
> Boehm---04/14/2016 07:53:43 PM---Hi Ethan, thanks for catching this issue.
> The parfor script itself is perfectly fine
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: [email protected]
> Cc: "Ethan Xu" <[email protected]>
> Date: 04/14/2016 07:53 PM
> Subject: Re: parfor fails
> ------------------------------
>
>
>
> Hi Ethan,
>
> thanks for catching this issue. The parfor script itself is perfectly fine
> but you encountered an interesting runtime bug. Usually, you can find the
> actual cause at the bottom of the stacktrace or in previous exceptions. I
> was able to reproduce this issue if NO systemml config file is provided
> (fails on parsing this non-existing config in the parfor mr job task
> setup). So the workaround is to put a SystemML-config.xml into the same
> directory. Interestingly, the issue did not show up in our testsuite
> because we always specify a default configuration there (which was until
> recently mandatory).
>
> As a side note, we strongly recommend parfor over for loops here because
> it runs the entire loop in 1 instead of 2396 MR jobs due to automatic data
> partitioning. However, for the specific example at hand, a data-parallel
> formulation (with "s = colSums(x==0)") would be even better as it allows
> for partial aggregation and hence reduces shuffle.
>
> Regards,
> Matthias
>
> Ethan Xu ---04/14/2016 01:34:24 PM---Hello, I have a quick question. The
> following script fails with this error:
>
> From: Ethan Xu <[email protected]>
> To: [email protected]
> Date: 04/14/2016 01:34 PM
> Subject: parfor fails
> ------------------------------
>
>
>
> Hello,
>
> I have a quick question. The following script fails with this error:
>
> org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to execute
> loop in parallel.
>
> Here is the dml script:
>
> x=read($X);
>
> print("number of rows of x = " + nrow(x));
> print("number of cols of x = " + ncol(x));
>
> parfor(i in 1:ncol(x), check=0){
>   a = x[,i];
>   print("number of 0's in col " + i + " = " + sum(a == 0));
> }
>
> where X is a 35 million by 2396 matrix (coded and dummy coded numerical
> matrix) on HDFS. The script runs fine with regular 'for' loops.
>
> Could someone explain why this script cannot run in parallel? Was it a
> wrong way to code parfor?
>
> Thanks,
>
> Ethan
>
>
>
>

16/04/16 17:52:18 INFO api.DMLScript: END DML run 04/16/2016 17:52:18
16/04/16 17:52:18 ERROR api.DMLScript: Failed to execute DML script.
org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to execute loop in 
parallel.
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
        at org.apache.sysml.api.DMLScript.execute(DMLScript.java:678)
        at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338)
        at org.apache.sysml.api.DMLScript.main(DMLScript.java:197)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to 
execute loop in parallel.
        at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:627)
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:146)
        ... 8 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: java.io.IOException: 
Job failed!
        at 
org.apache.sysml.runtime.controlprogram.parfor.RemoteDPParForMR.runJob(RemoteDPParForMR.java:218)
        at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.executeRemoteMRParForDP(ParForProgramBlock.java:969)
        at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:610)
        ... 9 more
Caused by: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1373)
        at 
org.apache.sysml.runtime.controlprogram.parfor.RemoteDPParForMR.runJob(RemoteDPParForMR.java:184)
        ... 11 more
Exception in thread "main" org.apache.sysml.api.DMLException: 
org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to execute loop in 
parallel.
        at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:350)
        at org.apache.sysml.api.DMLScript.main(DMLScript.java:197)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to execute loop in 
parallel.
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
        at org.apache.sysml.api.DMLScript.execute(DMLScript.java:678)
        at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338)
        ... 6 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to 
execute loop in parallel.
        at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:627)
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:146)
        ... 8 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: java.io.IOException: 
Job failed!
        at 
org.apache.sysml.runtime.controlprogram.parfor.RemoteDPParForMR.runJob(RemoteDPParForMR.java:218)
        at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.executeRemoteMRParForDP(ParForProgramBlock.java:969)
        at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:610)
        ... 9 more
Caused by: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1373)
        at 
org.apache.sysml.runtime.controlprogram.parfor.RemoteDPParForMR.runJob(RemoteDPParForMR.java:184)
        ... 11 more

2016-04-16 17:48:59,444 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2016-04-16 17:49:00,634 WARN org.apache.hadoop.conf.Configuration: session.id 
is deprecated. Instead, use dfs.metrics.session-id
2016-04-16 17:49:00,635 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2016-04-16 17:49:00,715 WARN org.apache.hadoop.conf.Configuration: 
slave.host.name is deprecated. Instead, use dfs.datanode.hostname
2016-04-16 17:49:01,287 INFO org.apache.hadoop.util.ProcessTree: setsid exited 
with exit code 0
2016-04-16 17:49:01,296 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@39f253d6
2016-04-16 17:49:01,489 INFO org.apache.hadoop.mapred.ReduceTask: Using 
ShuffleConsumerPlugin: org.apache.hadoop.mapred.ReduceTask$ReduceCopier
2016-04-16 17:49:01,522 INFO org.apache.hadoop.mapred.ReduceTask: 
ShuffleRamManager: MemoryLimit=140456768, MaxSingleShuffleLimit=35114192
2016-04-16 17:49:01,537 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,537 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,538 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,538 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,538 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,539 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,539 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,540 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,540 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,541 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,542 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,543 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,554 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,555 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,557 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,565 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,566 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,567 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,582 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,593 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.lz4]
2016-04-16 17:49:01,597 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread started: Thread for merging 
on-disk files
2016-04-16 17:49:01,597 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:01,597 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread started: Thread for merging in 
memory files
2016-04-16 17:49:01,600 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Need another 4980 map output(s) where 0 
is already in progress
2016-04-16 17:49:01,600 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread started: Thread for polling Map 
Completion Events
2016-04-16 17:49:01,600 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 0 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:02,286 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 244 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:03,366 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 240 outputs (0 slow hosts and77 
dup hosts)
2016-04-16 17:49:08,732 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new compressor [.lz4]
2016-04-16 17:49:08,742 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 369 segments...
2016-04-16 17:49:08,744 INFO org.apache.hadoop.mapred.Merger: Merging 369 
sorted segments
2016-04-16 17:49:08,749 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 369 segments left of total size: 21105900 bytes
2016-04-16 17:49:08,819 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 171 outputs (0 slow hosts 
and1302 dup hosts)
2016-04-16 17:49:08,904 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 369 files in-memory 
complete. Local file is 
/datastore/hdfs6/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_873.out
 of size 116760
2016-04-16 17:49:08,914 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:08,919 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 244 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:12,219 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 370 segments...
2016-04-16 17:49:12,220 INFO org.apache.hadoop.mapred.Merger: Merging 370 
sorted segments
2016-04-16 17:49:12,222 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 370 segments left of total size: 21258605 bytes
2016-04-16 17:49:12,253 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 370 files in-memory 
complete. Local file is 
/datastore/hdfs7/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_119.out
 of size 117498
2016-04-16 17:49:12,260 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:12,312 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 164 outputs (0 slow hosts 
and1300 dup hosts)
2016-04-16 17:49:12,360 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 244 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:15,619 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 370 segments...
2016-04-16 17:49:15,619 INFO org.apache.hadoop.mapred.Merger: Merging 370 
sorted segments
2016-04-16 17:49:15,621 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 370 segments left of total size: 21097865 bytes
2016-04-16 17:49:15,643 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 370 files in-memory 
complete. Local file is 
/datastore/hdfs8/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_272.out
 of size 116662
2016-04-16 17:49:15,649 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:15,699 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 164 outputs (0 slow hosts 
and1147 dup hosts)
2016-04-16 17:49:15,770 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 244 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:19,009 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 367 segments...
2016-04-16 17:49:19,009 INFO org.apache.hadoop.mapred.Merger: Merging 367 
sorted segments
2016-04-16 17:49:19,011 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 367 segments left of total size: 21105896 bytes
2016-04-16 17:49:19,030 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 367 files in-memory 
complete. Local file is 
/datastore/hdfs9/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_124.out
 of size 116670
2016-04-16 17:49:19,035 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:19,078 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 163 outputs (0 slow hosts 
and1094 dup hosts)
2016-04-16 17:49:19,737 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 209 outputs (0 slow hosts 
and392 dup hosts)
2016-04-16 17:49:22,639 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 368 segments...
2016-04-16 17:49:22,639 INFO org.apache.hadoop.mapred.Merger: Merging 368 
sorted segments
2016-04-16 17:49:22,641 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 368 segments left of total size: 21097861 bytes
2016-04-16 17:49:22,657 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 368 files in-memory 
complete. Local file is 
/datastore/hdfs10/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_2192.out
 of size 117286
2016-04-16 17:49:22,663 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:22,740 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 240 outputs (0 slow hosts and31 
dup hosts)
2016-04-16 17:49:25,421 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 367 segments...
2016-04-16 17:49:25,421 INFO org.apache.hadoop.mapred.Merger: Merging 367 
sorted segments
2016-04-16 17:49:25,422 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 367 segments left of total size: 21105896 bytes
2016-04-16 17:49:25,435 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 367 files in-memory 
complete. Local file is 
/datastore/hdfs1/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_4317.out
 of size 116262
2016-04-16 17:49:25,442 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:25,501 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 243 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:25,558 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 243 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:28,352 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 368 segments...
2016-04-16 17:49:28,353 INFO org.apache.hadoop.mapred.Merger: Merging 368 
sorted segments
2016-04-16 17:49:28,354 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 368 segments left of total size: 21049639 bytes
2016-04-16 17:49:28,363 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 368 files in-memory 
complete. Local file is 
/datastore/hdfs2/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_452.out
 of size 116525
2016-04-16 17:49:28,368 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:28,419 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 161 outputs (0 slow hosts 
and738 dup hosts)
2016-04-16 17:49:28,749 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 223 outputs (0 slow hosts 
and128 dup hosts)
2016-04-16 17:49:31,191 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 377 segments...
2016-04-16 17:49:31,191 INFO org.apache.hadoop.mapred.Merger: Merging 377 
sorted segments
2016-04-16 17:49:31,193 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 377 segments left of total size: 21523840 bytes
2016-04-16 17:49:31,215 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 377 files in-memory 
complete. Local file is 
/datastore/hdfs3/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_2577.out
 of size 118587
2016-04-16 17:49:31,220 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:31,248 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 241 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:33,980 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 369 segments...
2016-04-16 17:49:33,981 INFO org.apache.hadoop.mapred.Merger: Merging 369 
sorted segments
2016-04-16 17:49:33,982 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 369 segments left of total size: 21154122 bytes
2016-04-16 17:49:33,992 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 369 files in-memory 
complete. Local file is 
/datastore/hdfs4/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_3199.out
 of size 116377
2016-04-16 17:49:33,996 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:34,038 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 161 outputs (0 slow hosts 
and490 dup hosts)
2016-04-16 17:49:34,116 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 237 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:36,761 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 367 segments...
2016-04-16 17:49:36,761 INFO org.apache.hadoop.mapred.Merger: Merging 367 
sorted segments
2016-04-16 17:49:36,762 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 367 segments left of total size: 21065711 bytes
2016-04-16 17:49:36,766 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 153 outputs (0 slow hosts 
and300 dup hosts)
2016-04-16 17:49:36,772 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 367 files in-memory 
complete. Local file is 
/datastore/hdfs5/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_3264.out
 of size 117471
2016-04-16 17:49:36,779 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:36,825 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 196 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:39,450 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 366 segments...
2016-04-16 17:49:39,450 INFO org.apache.hadoop.mapred.Merger: Merging 366 
sorted segments
2016-04-16 17:49:39,451 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 366 segments left of total size: 20961228 bytes
2016-04-16 17:49:39,461 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 366 files in-memory 
complete. Local file is 
/datastore/hdfs6/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_3035.out
 of size 116975
2016-04-16 17:49:39,466 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:39,505 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 101 outputs (0 slow hosts 
and321 dup hosts)
2016-04-16 17:49:39,583 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 157 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:41,598 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 119 outputs (0 slow hosts and73 
dup hosts)
2016-04-16 17:49:42,333 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 369 segments...
2016-04-16 17:49:42,335 INFO org.apache.hadoop.mapred.Merger: Merging 369 
sorted segments
2016-04-16 17:49:42,337 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 369 segments left of total size: 21130536 bytes
2016-04-16 17:49:42,351 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 369 files in-memory 
complete. Local file is 
/datastore/hdfs7/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_4464.out
 of size 117516
2016-04-16 17:49:42,355 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:42,372 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 110 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:42,436 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 86 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:42,466 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 60 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:44,478 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 30 outputs (0 slow hosts and62 
dup hosts)
2016-04-16 17:49:45,266 INFO org.apache.hadoop.mapred.ReduceTask: Initiating 
in-memory merge with 371 segments...
2016-04-16 17:49:45,267 INFO org.apache.hadoop.mapred.Merger: Merging 371 
sorted segments
2016-04-16 17:49:45,267 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 371 segments left of total size: 21081793 bytes
2016-04-16 17:49:45,276 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Merge of the 371 files in-memory 
complete. Local file is 
/datastore/hdfs8/mapred/local/taskTracker/ethan.xu/jobcache/job_201604031517_24995/attempt_201604031517_24995_r_000152_0/output/map_1427.out
 of size 117479
2016-04-16 17:49:45,281 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Thread waiting: Thread for merging 
on-disk files
2016-04-16 17:49:45,419 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 40 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,441 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 25 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,446 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 21 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,506 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 15 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,515 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 11 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,532 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 8 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,541 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 5 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,543 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 4 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,545 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 1 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,547 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 1 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,549 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201604031517_24995_r_000152_0 Scheduled 1 outputs (0 slow hosts and0 
dup hosts)
2016-04-16 17:49:45,550 INFO org.apache.hadoop.mapred.ReduceTask: 
GetMapEventsThread exiting
2016-04-16 17:49:45,550 INFO org.apache.hadoop.mapred.ReduceTask: 
getMapsEventsThread joined.
2016-04-16 17:49:45,551 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram 
manager
2016-04-16 17:49:45,551 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved 
on-disk merge complete: 13 files left.
2016-04-16 17:49:45,551 INFO org.apache.hadoop.mapred.ReduceTask: In-memory 
merge complete: 182 files left.
2016-04-16 17:49:46,860 INFO org.apache.hadoop.mapred.Merger: Merging 182 
sorted segments
2016-04-16 17:49:46,860 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 182 segments left of total size: 10408279 bytes
2016-04-16 17:49:46,871 INFO org.apache.hadoop.mapred.ReduceTask: Merged 182 
segments, 10408279 bytes to disk to satisfy reduce memory limit
2016-04-16 17:49:46,872 INFO org.apache.hadoop.mapred.ReduceTask: Merging 14 
files, 1579600 bytes from disk
2016-04-16 17:49:46,873 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 
segments, 0 bytes from memory into reduce
2016-04-16 17:49:46,873 INFO org.apache.hadoop.mapred.Merger: Merging 14 sorted 
segments
2016-04-16 17:49:46,881 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 14 segments left of total size: 285137239 bytes
2016-04-16 17:49:47,036 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2016-04-16 17:49:47,038 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error in configuring object
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
        ... 9 more
Caused by: java.lang.RuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: Failed to create non-existing 
local working directory: /path.to/ethan.xu/tmp/systemml
        at 
org.apache.sysml.runtime.controlprogram.parfor.RemoteDPParWorkerReducer.configure(RemoteDPParWorkerReducer.java:200)
        ... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: Failed to create 
non-existing local working directory: /path.to/ethan.xu/tmp/systemml
        at 
org.apache.sysml.runtime.util.LocalFileUtils.createWorkingDirectoryWithUUID(LocalFileUtils.java:383)
        at 
org.apache.sysml.runtime.controlprogram.parfor.RemoteDPParWorkerReducer.configure(RemoteDPParWorkerReducer.java:180)
        ... 14 more
2016-04-16 17:49:47,041 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

Re: parfor fails

Reply via email to