Seems it's a problem of String taskType = (conf.getBoolean(JobContext.TASK_ISMAP, true)) ? "m" : "r" ;
on line 137 of org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter conf.getBoolean(JobContext.TASK_ISMAP, true) returns 'true' (perhaps using provided default value) even in reducers causing (line 139) 'charIx' to get value -1 that leads to the error Confirmed on my test case. When error occurs in reducer, 'name' has value '0-r-00000', but 'taskType' has value 'm'. The following (ugly) hack fixed the problem. DML ran successfully afterwards. Change line 137 - 139 of org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter to: String name = file.getName(); Pattern p = Pattern.compile("-[rm]-"); Matcher m = p.matcher(name); int charIx = 0; if (m.find()) { charIx = m.start(); }else{ throw new RuntimeException("file name :" + name + "doesn't contain 'r' or 'm'"); } Ethan On Fri, Feb 5, 2016 at 4:37 PM, Ethan Xu <ethan.yifa...@gmail.com> wrote: > Thanks tried that and moved a bit further. Now a new exception (still in > reduce phase of 'CSV-Reblock-MR'): > > WARN org.apache.hadoop.mapred.Child: Error running child > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.String.substring(String.java:1911) > at > org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.moveFileToDestination(MultipleOutputCommitter.java:140) > at > org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.moveFinalTaskOutputs(MultipleOutputCommitter.java:119) > at > org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.commitTask(MultipleOutputCommitter.java:94) > at > org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:221) > at org.apache.hadoop.mapred.Task.commit(Task.java:1005) > at org.apache.hadoop.mapred.Task.done(Task.java:875) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:453) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > > > On Fri, Feb 5, 2016 at 4:03 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > >> ok that is interesting. I think the following is happening: The hadoop >> version is >2.0, which makes SystemML switch to the 2.x configuration >> properties. However, because MR1 is bundled into this distribution these >> configurations do not exist which makes us fail on processing task ids. >> >> Workaround: Change >> org.apache.sysml.runtime.matrix.mapred.MRConfigurationNames line 85 to >> *"boolean* hadoopVersion2 = false". >> >> Regards, >> Matthias >> >> [image: Inactive hide details for Ethan Xu ---02/05/2016 12:36:27 >> PM---Thank you very much. I just pulled the update, rebuilt the proje]Ethan >> Xu ---02/05/2016 12:36:27 PM---Thank you very much. I just pulled the >> update, rebuilt the project and reran the code. >> >> From: Ethan Xu <ethan.yifa...@gmail.com> >> To: dev@systemml.incubator.apache.org >> Date: 02/05/2016 12:36 PM >> Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1 >> ------------------------------ >> >> >> >> Thank you very much. I just pulled the update, rebuilt the project and >> reran the code. >> >> The method-not-found error was gone, and the MapReduce job was kicked off. >> The 'Assign-RowID-MR' job finished successfully. >> The map phase of 'CSV-Reblock-MR' job finished, but reducers threw >> NullPointerExceptions at >> >> java.lang.NullPointerException >> at >> org.apache.sysml.runtime.matrix.mapred.ReduceBase.close(ReduceBase.java:205) >> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> >> The job I ran was the same as before on the same data: >> hadoop jar <SystemML dir>/target/SystemML.jar -libjars <local >> dir>/hadoop-lzo-0.4.15.jar -f <SystemML >> dir>/scripts/algorithms/Univar-Stats.dml -nvargs X=<HDFS >> dir>/original-coded.csv TYPES=<HDFS dir>/original-coded-type.csv >> STATS=<HDFS dir>/univariate-summary.csv >> >> The hadoop cluster was also the same one: CDH4.2.1. >> >> Sorry for keep coming back with problems on a really old hadoop system. >> Please let me know what other information is needed to diagnose the issue. >> >> Ethan >> >> >> On Fri, Feb 5, 2016 at 1:26 PM, Deron Eriksson <deroneriks...@gmail.com> >> wrote: >> >> > Hi Ethan, >> > >> > I believe your safest, cleanest bet is to wait for the fix from >> Matthias. >> > When he pushes the fix, you will see it at >> > https://github.com/apache/incubator-systemml/commits/master. At that >> > point, >> > you can pull (git pull) the changes from GitHub to your machine and then >> > build with Maven utilizing the new changes. >> > >> > Alternatively, it's not really recommended, but you might be able to use >> > -libjars to reference the hadoop-commons jar, which should be in your >> local >> > maven repository >> > >> > >> (.m2/repository/org/apache/hadoop/hadoop-common/2.4.1/hadoop-common-2.4.1.jar). >> > However, mixing jar versions usually doesn't work very well (it can >> lead to >> > other problems), so waiting for the fix is best. >> > >> > Deron >> > >> > >> > On Fri, Feb 5, 2016 at 6:47 AM, Ethan Xu <ethan.yifa...@gmail.com> >> wrote: >> > >> > > Thank you Shirish and Deron for the suggestions. Looking forward to >> the >> > fix >> > > from Matthias! >> > > >> > > We are using the hadoop-common shipped with CDH4.2.1, and it's in >> > > classpath. I'm a bit hesitate to alter our hadoop configuration to >> > include >> > > other versions since other people are using it too. >> > > >> > > Not sure if/how the following naive approach affects the program >> > behavior, >> > > but I did try changing the scope of >> > > >> > > <groupId>org.apache.hadoop</groupId> >> > > <artifactId>hadoop-common</artifactId> >> > > <version>${hadoop.version}</version> >> > > >> > > in SystemML's pom.xml from 'provided' to 'compile' and rebuilt the jar >> > > (21MB), and it threw the same error. >> > > >> > > By the way this is in pom.xml line 65 - 72: >> > > <properties> >> > > <hadoop.version>2.4.1</hadoop.version> >> > > <antlr.version>4.3</antlr.version> >> > > <spark.version>1.4.1</spark.version> >> > > >> > > <!-- OS-specific JVM arguments for running integration >> > > tests --> >> > > <integrationTestExtraJVMArgs /> >> > > </properties> >> > > >> > > Am I supposed to modify the hadoop.version before build? >> > > >> > > Thanks again, >> > > >> > > Ethan >> > > >> > > >> > > >> > > On Fri, Feb 5, 2016 at 2:29 AM, Deron Eriksson < >> deroneriks...@gmail.com> >> > > wrote: >> > > >> > > > Hi Matthias, >> > > > >> > > > Glad to hear the fix is simple. Mixing jar versions sometimes is not >> > very >> > > > fun. >> > > > >> > > > Deron >> > > > >> > > > >> > > > On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm <mbo...@us.ibm.com> >> > > wrote: >> > > > >> > > > > well, let's not mix different hadoop versions in the class path or >> > > > > client/server. If I'm not mistaken, cdh 4.x always shipped with MR >> > v1. >> > > > It's >> > > > > a trivial fix for us and will be in the repo tomorrow morning >> anyway. >> > > > > Thanks for catching this issue Ethan. >> > > > > >> > > > > Regards, >> > > > > Matthias >> > > > > >> > > > > [image: Inactive hide details for Deron Eriksson ---02/04/2016 >> > 11:04:38 >> > > > > PM---Hi Ethan, Just FYI, I looked at >> hadoop-common-2.0.0-cdh4.2]Deron >> > > > > Eriksson ---02/04/2016 11:04:38 PM---Hi Ethan, Just FYI, I looked >> at >> > > > > hadoop-common-2.0.0-cdh4.2.1.jar ( >> > > > > >> > > > > From: Deron Eriksson <deroneriks...@gmail.com> >> > > > > To: dev@systemml.incubator.apache.org >> > > > > Date: 02/04/2016 11:04 PM >> > > > > Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1 >> > > > > ------------------------------ >> > > > > >> > > > > >> > > > > >> > > > > Hi Ethan, >> > > > > >> > > > > Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar ( >> > > > > >> > > > > >> > > > >> > > >> > >> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/ >> > > > > ), >> > > > > since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the >> > > > > org.apache.hadoop.conf.Configuration class in that jar doesn't >> appear >> > > to >> > > > > have a getDouble method, so using that version of hadoop-common >> won't >> > > > work. >> > > > > >> > > > > However, the hadoop-common-2.4.1.jar ( >> > > > > >> > > > > >> > > > >> > > >> > >> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/ >> > > > > ) >> > > > > >> > > > > does appear to have the getDouble method. It's possible that >> adding >> > > that >> > > > > jar to your classpath may fix your problem, as Shirish pointed >> out. >> > > > > >> > > > > It sounds like Matthias may have another fix. >> > > > > >> > > > > Deron >> > > > > >> > > > > >> > > > > >> > > > > On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm <mbo...@us.ibm.com >> > >> > > > wrote: >> > > > > >> > > > > > well, we did indeed not run on MR v1 for a while now. However, I >> > > don't >> > > > > > want to get that far and say we don't support it anymore. I'll >> fix >> > > this >> > > > > > particular issue by tomorrow. >> > > > > > >> > > > > > In the next couple of weeks we should run our full performance >> > > > testsuite >> > > > > > (for broad coverage) over an MR v1 cluster and systematically >> > remove >> > > > > > unnecessary incompatibility like this instance. Any volunteers? >> > > > > > >> > > > > > Regards, >> > > > > > Matthias >> > > > > > >> > > > > > [image: Inactive hide details for Ethan Xu ---02/04/2016 >> 05:51:28 >> > > > > > PM---Hello, I got an error when running the >> > > > > systemML/scripts/Univar-S]Ethan >> > > > > > Xu ---02/04/2016 05:51:28 PM---Hello, I got an error when >> running >> > the >> > > > > > systemML/scripts/Univar-Stats.dml script on >> > > > > > >> > > > > > From: Ethan Xu <ethan.yifa...@gmail.com> >> > > > > > To: dev@systemml.incubator.apache.org >> > > > > > Date: 02/04/2016 05:51 PM >> > > > > > Subject: Compatibility with MR1 Cloudera cdh4.2.1 >> > > > > > ------------------------------ >> > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Hello, >> > > > > > >> > > > > > I got an error when running the >> systemML/scripts/Univar-Stats.dml >> > > > script >> > > > > on >> > > > > > a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set. Error >> > message >> > > > is >> > > > > at >> > > > > > the bottom of the email. The same script ran fine on a smaller >> > sample >> > > > > > (several MB) of the same data set, when MR was not invoked. >> > > > > > >> > > > > > The main error was java.lang.NoSuchMethodError: >> > > > > > org.apache.hadoop.mapred.JobConf.getDouble() >> > > > > > Digging deeper, it looks like the CDH4.2.1 version of MR indeed >> > > didn't >> > > > > have >> > > > > > the JobConf.getDouble() method. >> > > > > > >> > > > > > The hadoop-core jar of CDH4.2.1 can be found here: >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > >> > >> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.2.1/ >> > > > > >> > > > > > >> > > > > > The calling line of SystemML is line 1194 of >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/mapred/MRJobConfiguration.java >> > > > > > >> > > > > > I was wondering, if the finding is accurate, is there a >> potential >> > > fix, >> > > > or >> > > > > > does this mean the current version of SystemML is not compatible >> > with >> > > > > > CDH4.2.1? >> > > > > > >> > > > > > Thank you, >> > > > > > >> > > > > > Ethan >> > > > > > >> > > > > > >> > > > > > hadoop jar $sysDir/target/SystemML.jar -f >> > > > > > $sysDir/scripts/algorithms/Univar-Stats.dml -nvargs >> > > > > > X=$baseDirHDFS/original-coded.csv >> > > > > > TYPES=$baseDirHDFS/original-coded-type.csv >> > > > > > STATS=$baseDirHDFS/univariate-summary.csv >> > > > > > >> > > > > > 16/02/04 20:35:03 INFO api.DMLScript: BEGIN DML run 02/04/2016 >> > > 20:35:03 >> > > > > > 16/02/04 20:35:03 INFO api.DMLScript: HADOOP_HOME: null >> > > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: No default SystemML >> config >> > > file >> > > > > > (./SystemML-config.xml) found >> > > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: Using default settings in >> > > > > DMLConfig >> > > > > > 16/02/04 20:35:04 WARN hops.OptimizerUtils: Auto-disable >> > > multi-threaded >> > > > > > text read for 'text' and 'csv' due to thread contention on JRE < >> > 1.8 >> > > > > > (java.version=1.7.0_71). >> > > > > > SLF4J: Class path contains multiple SLF4J bindings. >> > > > > > SLF4J: Found binding in >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> > > > > > SLF4J: Found binding in >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> [jar:file:/usr/local/explorys/datagrid/lib/slf4j-jdk14-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> > > > > > SLF4J: Found binding in >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> [jar:file:/usr/local/explorys/datagrid/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings >> for >> > an >> > > > > > explanation. >> > > > > > 16/02/04 20:35:07 INFO api.DMLScript: SystemML Statistics: >> > > > > > Total execution time: 0.880 sec. >> > > > > > Number of executed MR Jobs: 0. >> > > > > > >> > > > > > 16/02/04 20:35:07 INFO api.DMLScript: END DML run 02/04/2016 >> > 20:35:07 >> > > > > > Exception in thread "main" java.lang.NoSuchMethodError: >> > > > > > org.apache.hadoop.mapred.JobConf.getDouble(Ljava/lang/String;D)D >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs(MRJobConfiguration.java:1195) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs(MRJobConfiguration.java:1129) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob(CSVReblockMR.java:307) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob(CSVReblockMR.java:289) >> > > > > > at >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.matrix.CSVReblockMR.runJob(CSVReblockMR.java:275) >> > > > > > at >> > > > > >> org.apache.sysml.lops.runtime.RunMRJobs.submitJob(RunMRJobs.java:257) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.lops.runtime.RunMRJobs.prepareAndSubmitJob(RunMRJobs.java:143) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.instructions.MRJobInstruction.processInstruction(MRJobInstruction.java:1500) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169) >> > > > > > at >> > > > > > >> > > > >> > >> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:146) >> > > > > > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:676) >> > > > > > at >> > > org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338) >> > > > > > at org.apache.sysml.api.DMLScript.main(DMLScript.java:197) >> > > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > > > > > at java.lang.reflect.Method.invoke(Method.java:606) >> > > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > >> > >> >> >> >