[
https://issues.apache.org/jira/browse/SYSTEMML-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Baunsgaard resolved SYSTEMML-775.
-------------------------------------------
Fix Version/s: Not Applicable
Resolution: Done
> Distribute Data for spark
> -------------------------
>
> Key: SYSTEMML-775
> URL: https://issues.apache.org/jira/browse/SYSTEMML-775
> Project: SystemDS
> Issue Type: Question
> Components: Algorithms
> Affects Versions: SystemML 0.10
> Reporter: Johannes Wilke
> Priority: Minor
> Fix For: Not Applicable
>
>
> Hi!
> I have to calculate in parallel on data on a spark-Cluster with SystemML.
> The program works fine on the cluster, but not in parallel, because I don't
> know how to distribute my data throw this Cluster to use the data with
> SystemML.
> In Scala I have tried the following:
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, dff, mc,
> false)
> sysMlMatrix.saveAsObjectFile("/home/hduser/test.obj")
> val sysMlMatrix2 = sc.sequenceFile[MatrixIndexes,
> MatrixBlock]("/home/hduser/test.obj",1000);
> val sysMlMatrix3 = JavaPairRDD.fromRDD(sysMlMatrix2)
> ml.reset()
> ml.registerInput("X", sysMlMatrix3, numRows, numCols)
> But I get a ClassCastException, when I try to load the object File.
> My Matrix has 1000 rows and I want to work in parallel on these rows.
> How can I reach this? I hope you can help me!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)