[
https://issues.apache.org/jira/browse/SYSTEMML-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524778#comment-15524778
]
Imran Younus commented on SYSTEMML-831:
---------------------------------------
I've tried everything I can to make this code run on spark without any success.
The code has two functions. Right now I'm only trying to run first of the
these: {{x2p}}.
I'm using MNIST data. If I used only 10k points from MNIST data, the code runs
on the driver node only and it works find. Once I used all 60k points, it
doesn't work. Here is how I'm running the code:
{{> spark-submit --master spark://rr-ram11:7077 --conf spark.driver.memory=20g
--conf spark.executor.memory=20g --conf spark.executor.cores=4 --class
org.apache.sysml.api.DMLScript target/SystemML.jar -f scripts/staging/tSNE.dml
-stats -explain -exec spark}}
I'm using a 10 node cluster with 512GB ram on each node. I've used many
different configurations for ram and cores and all that nothing seem to work.
Here is the main problem:
org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program
block generated from statement block between lines 153 and 153 -- Error
evaluating instruction:
CP°extfunct°.defaultNS°x2p°2°1°X·MATRIX·DOUBLE°30·SCALAR·INT·true°P
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:698)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:364)
at org.apache.sysml.api.DMLScript.main(DMLScript.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 153 and 153 --
Error evaluating instruction: CP°extfunct°.defaultNS°x2p°2°1°X·M
ATRIX·DOUBLE°30·SCALAR·INT·true°P
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
... 12 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing
function .defaultNS::x2p
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in function program block generated from function statement block between lines
51 and 103 -- Error evaluating function program block
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
... 16 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in while program block generated from while statement block between lines 69
and 98 -- Error evaluating while program block
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
... 17 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 70 and 77 --
Error evaluating instruction: SPARK°map/°_mVar277·MATRIX·DOUBLE°_m
Var278·MATRIX·DOUBLE°_mVar280·MATRIX·DOUBLE°RIGHT°COL_VECTOR
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
... 18 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in
stage 12.0 (TID 15, rr-ram8.softlayer.com): org.apache.spark.stora
ge.BlockFetchException: Failed to fetch block from 1 locations. Most recent
failure cause:
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:605)
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:595)
at
org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:580)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:640)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size
exceeds Integer.MAX_VALUE
Attached is the complete log file.
[~nakul02] [~mboehm7] [~niketanpansare]
> Implement t-SNE algorithm
> -------------------------
>
> Key: SYSTEMML-831
> URL: https://issues.apache.org/jira/browse/SYSTEMML-831
> Project: SystemML
> Issue Type: Improvement
> Components: Algorithms
> Reporter: Imran Younus
> Assignee: Imran Younus
>
> This jira implements the t-distributed Stochastic Neighbor Embedding
> algorithm for dimensionality reduction presented in this paper:
> Visualizing Data using t-SNE
> by Laurens van der Maaten, Geoffrey Hinton
> http://www.jmlr.org/papers/v9/vandermaaten08a.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)