Hello All, I have some questions running systemml scripts on HDFS (with hybrid_spark execution mode).
My Current Configuration: Standalone HDFS on OSX (version 2.8) and Spark Pre-Built for hadoop 2.7 (version 2.1.0) *jps* out from my system [image: Inline image 1] Both of them have been installed separately. As far as I understand, to enable hdfs support we need to run spark master on yarn-client | yarn-cluster. (Is this understanding correct?) My question: I dont have access to a cluster, is there a way to set up a yarn-client / yarn-cluster or my local system so that I can run systemml scripts on hybrid_spark mode with HDFS?. If yes could you please point to some documentation?. Thank you so much, Krishna PS : sysout of what I have tried already attached below.
# Standalone System-ML jar SCRIPT_DIR=$SYSTEMML_HOME/scripts/* BUILD_DIR=$SYSTEMML_HOME/target/* LIB_DIR=$SYSTEMML_HOME/target/lib/* HADOOP_HOME=$SYSTEMML_HOME/target/lib/hadoop/* SYSTEMML_JAR=$SYSTEMML_HOME/target/systemml-1.0.0-SNAPSHOT.jar FORMAT="csv" ALGO=/Users/krishna/open-source/incubator-systemml/scripts/datagen/genRandData4Kmeans.dml java -cp $SCRIPT_DIR:$BUILD_DIR:$LIB_DIR:$HADOOP_HOME org.apache.sysml.api.DMLScript -Dlog4j.configuration=file:'$SYSTEMML_HOME/conf/log4j.properties' -f $ALGO -exec hybrid_spark -nvargs nr=10000 nf=1000 nc=50 dc=10.0 dr=1.0 fbf=100.0 cbf=100.0 X=hdfs:///data/X.data C=hdfs:///data/C.data Y=hdfs:///data/Y.data YbyC=hdfs:///data/YbyC.data fmt=$FORMAT #### Logs krishna@Krishna:~/open-source/scripts$ java -cp $SCRIPT_DIR:$BUILD_DIR:$LIB_DIR:$HADOOP_HOME org.apache.sysml.api.DMLScript -Dlog4j.configuration=file:'$SYSTEMML_HOME/conf/log4j.properties' -f $ALGO -exec hybrid_spark -nvargs nr=10000 nf=1000 nc=50 dc=10.0 dr=1.0 fbf=100.0 cbf=100.0 X=hdfs:///data/X.data C=hdfs:///data/C.data Y=hdfs:///data/Y.data YbyC=hdfs:///data/YbyC.data fmt=$FORMAT log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. BEGIN K-MEANS GENERATOR SCRIPT Generating cluster distribution (mixture) centroids... Generating record-to-cluster assignments... Generating within-cluster random shifts... Generating records by shifting from centroids... Computing record-to-cluster assignments by minimum centroid distance... Computing useful statistics... Writing out the resulting dataset... Exception in thread "main" org.apache.sysml.api.DMLException: org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 80 and 119 -- Error evaluating instruction: CP掳write掳C路MATRIX路DOUBLE掳hdfs:///data/C.data路SCALAR路STRING路true掳csv路SCALAR路STRING路true掳false掳,掳false掳路SCALAR路STRING路true at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) at org.apache.sysml.api.DMLScript.main(DMLScript.java:207) Caused by: org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 80 and 119 -- Error evaluating instruction: CP掳write掳C路MATRIX路DOUBLE掳hdfs:///data/C.data路SCALAR路STRING路true掳csv路SCALAR路STRING路true掳false掳,掳false掳路SCALAR路STRING路true at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) at org.apache.sysml.api.DMLScript.execute(DMLScript.java:665) at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:346) ... 1 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 80 and 119 -- Error evaluating instruction: CP掳write掳C路MATRIX路DOUBLE掳hdfs:///data/C.data路SCALAR路STRING路true掳csv路SCALAR路STRING路true掳false掳,掳false掳路SCALAR路STRING路true at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) ... 3 more Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Export to hdfs:///data/C.data failed. at org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:779) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:694) at org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.writeCSVFile(VariableCPInstruction.java:826) at org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processWriteInstruction(VariableCPInstruction.java:773) at org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:642) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) ... 6 more Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs:/data, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:423) at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:590) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:441) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.sysml.runtime.util.MapReduceTool.writeMetaDataFile(MapReduceTool.java:390) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.writeMetaData(CacheableData.java:960) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:772) ... 11 more