[
https://issues.apache.org/jira/browse/SYSTEMML-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664735#comment-15664735
]
Imran Younus edited comment on SYSTEMML-831 at 11/14/16 7:17 PM:
-----------------------------------------------------------------
I've tried to run tSNE using the new features added by [~mboehm7]. I've 40GB
driver and 120GB executer. I'm using -exec hybrid_spark as suggested about. I
also vectorized the for loop in the x2p function as suggested by [~mboehm7] and
[~niketanpansare]. But, I'm still unable to run it with complete mnist data
set. Here is my spark-submit command:
{{> spark-submit --master=spark://rr-ram4.softlayer.com:7077 --conf
spark.executor.memory=120g --conf spark.driver.memory=80g
/home/iyounus/git/incubator-systemml/target/SystemML.jar -f
/home/iyounus/git/incubator-systemml/scripts/staging/tSNE.dml -exec
hybrid_spark -nvargs INPUT=data/mnist_train_no_labels.csv OUT=data/P.csv}}
Here is relevant part of the stack trace:
aused by: org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program
block generated from statement block between lines 138 and 138 -- Error
evaluating instruction:
CP°extfunct°.defaultNS°x2p°2°1°X·MATRIX·DOUBLE°30·SCALAR·INT·true°P
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:675)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:358)
... 10 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 138 and 138 --
Error evaluating instruction:
CP°extfunct°.defaultNS°x2p°2°1°X·MATRIX·DOUBLE°30·SCALAR·INT·true°P
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
... 12 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing
function .defaultNS::x2p
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in function program block generated from function statement block between lines
45 and 87 -- Error evaluating function program block
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
... 16 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in while program block generated from while statement block between lines 63
and 82 -- Error evaluating while program block
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
... 17 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 64 and 81 --
Error evaluating instruction:
SPARK°map/°_mVar235·MATRIX·DOUBLE°_mVar236·MATRIX·DOUBLE°_mVar238·MATRIX·DOUBLE°RIGHT°COL_VECTOR
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
... 18 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 73 in stage 9.0 failed 4 times, most recent failure: Lost task 73.3 in
stage 9.0 (TID 887, rr-ram4.softlayer.com): java.io.FileNotFoundException:
/tmp/spark-f95c253d-0361-4d44-90d1-cfe17520602c/executor-649f7d77-349e-4dfa-858b-7d0baccec5b2/blockmgr-1fa0c75c-cba0-493f-addd-db144bc0cbef/20/temp_shuffle_e15d0d1e-324b-4bd8-b45a-f547a0d79b80
(Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at
org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:339)
at
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:46)
at
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.toMatrixBlock(SparkExecutionContext.java:836)
at
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.toMatrixBlock(SparkExecutionContext.java:789)
at
org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:543)
at
org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:61)
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:464)
at
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getBroadcastForVariable(SparkExecutionContext.java:543)
at
org.apache.sysml.runtime.instructions.spark.BinarySPInstruction.processMatrixBVectorBinaryInstruction(BinarySPInstruction.java:157)
at
org.apache.sysml.runtime.instructions.spark.MatrixBVectorArithmeticSPInstruction.processInstruction(MatrixBVectorArithmeticSPInstruction.java:54)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 21 more
Caused by: java.io.FileNotFoundException:
/tmp/spark-f95c253d-0361-4d44-90d1-cfe17520602c/executor-649f7d77-349e-4dfa-858b-7d0baccec5b2/blockmgr-1fa0c75c-cba0-493f-addd-db144bc0cbef/20/temp_shuffle_e15d0d1e-324b-4bd8-b45a-f547a0d79b80
(Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[~nakul02]
was (Author: iyounus):
I've tried to run tSNE using the new features added by [~mboehm7] and . I've
40GB driver and 120GB executer. I'm using -exec hybrid_spark as suggested
about. I also vectorized the for loop in the x2p function as suggested by
[~mboehm7] and [~niketanpansare]. But, I'm still unable to run it with complete
mnist data set. Here is my spark-submit command:
{{> spark-submit --master=spark://rr-ram4.softlayer.com:7077 --conf
spark.executor.memory=120g --conf spark.driver.memory=80g
/home/iyounus/git/incubator-systemml/target/SystemML.jar -f
/home/iyounus/git/incubator-systemml/scripts/staging/tSNE.dml -exec
hybrid_spark -nvargs INPUT=data/mnist_train_no_labels.csv OUT=data/P.csv}}
Here is relevant part of the stack trace:
aused by: org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program
block generated from statement block between lines 138 and 138 -- Error
evaluating instruction:
CP°extfunct°.defaultNS°x2p°2°1°X·MATRIX·DOUBLE°30·SCALAR·INT·true°P
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:675)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:358)
... 10 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 138 and 138 --
Error evaluating instruction:
CP°extfunct°.defaultNS°x2p°2°1°X·MATRIX·DOUBLE°30·SCALAR·INT·true°P
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
... 12 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing
function .defaultNS::x2p
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in function program block generated from function statement block between lines
45 and 87 -- Error evaluating function program block
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
... 16 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in while program block generated from while statement block between lines 63
and 82 -- Error evaluating while program block
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
... 17 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 64 and 81 --
Error evaluating instruction:
SPARK°map/°_mVar235·MATRIX·DOUBLE°_mVar236·MATRIX·DOUBLE°_mVar238·MATRIX·DOUBLE°RIGHT°COL_VECTOR
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
... 18 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 73 in stage 9.0 failed 4 times, most recent failure: Lost task 73.3 in
stage 9.0 (TID 887, rr-ram4.softlayer.com): java.io.FileNotFoundException:
/tmp/spark-f95c253d-0361-4d44-90d1-cfe17520602c/executor-649f7d77-349e-4dfa-858b-7d0baccec5b2/blockmgr-1fa0c75c-cba0-493f-addd-db144bc0cbef/20/temp_shuffle_e15d0d1e-324b-4bd8-b45a-f547a0d79b80
(Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at
org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:339)
at
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:46)
at
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.toMatrixBlock(SparkExecutionContext.java:836)
at
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.toMatrixBlock(SparkExecutionContext.java:789)
at
org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:543)
at
org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:61)
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:464)
at
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getBroadcastForVariable(SparkExecutionContext.java:543)
at
org.apache.sysml.runtime.instructions.spark.BinarySPInstruction.processMatrixBVectorBinaryInstruction(BinarySPInstruction.java:157)
at
org.apache.sysml.runtime.instructions.spark.MatrixBVectorArithmeticSPInstruction.processInstruction(MatrixBVectorArithmeticSPInstruction.java:54)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 21 more
Caused by: java.io.FileNotFoundException:
/tmp/spark-f95c253d-0361-4d44-90d1-cfe17520602c/executor-649f7d77-349e-4dfa-858b-7d0baccec5b2/blockmgr-1fa0c75c-cba0-493f-addd-db144bc0cbef/20/temp_shuffle_e15d0d1e-324b-4bd8-b45a-f547a0d79b80
(Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[~nakul02]
> Implement t-SNE algorithm
> -------------------------
>
> Key: SYSTEMML-831
> URL: https://issues.apache.org/jira/browse/SYSTEMML-831
> Project: SystemML
> Issue Type: Improvement
> Components: Algorithms
> Reporter: Imran Younus
> Assignee: Imran Younus
> Attachments: out_2016_09_26_10.log
>
>
> This jira implements the t-distributed Stochastic Neighbor Embedding
> algorithm for dimensionality reduction presented in this paper:
> Visualizing Data using t-SNE
> by Laurens van der Maaten, Geoffrey Hinton
> http://www.jmlr.org/papers/v9/vandermaaten08a.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)