[
https://issues.apache.org/jira/browse/MAHOUT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604310#comment-14604310
]
ASF GitHub Bot commented on MAHOUT-1653:
----------------------------------------
Github user andrewpalumbo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/136#discussion_r33415160
--- Diff:
spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala
---
@@ -165,7 +168,14 @@ class CheckpointedDrmSpark[K: ClassTag](
else if (classOf[Writable].isAssignableFrom(ktag.runtimeClass)) (x:
K) => x.asInstanceOf[Writable]
else throw new IllegalArgumentException("Do not know how to convert
class tag %s to Writable.".format(ktag))
- rdd.saveAsSequenceFile(path)
--- End diff --
It seems that the solution could be to map the DrmRdd keys and values to
their respective Writables before calling `.saveAsSequenceFile(path)`. We
can use the k2wFunc for the Key, and the value will always be a Vector for Drms
so simply map to a VectorWritable. This is essentially what is done
internally in `.saveAsSequenceFile` for standard Writables.
```
// convert RDD to Writables
rddInput.toDrmRdd().map(x => (k2wFunc(x._1), v2w(x._2)) )
.saveAsSequenceFile(path)
```
However the above gives a runtime exception since the Writables are not
Java Serializable: ```org.apache.spark.SparkException: Task not
serializable```. Any thoughts on how to do this correctly?
> Spark 1.3
> ---------
>
> Key: MAHOUT-1653
> URL: https://issues.apache.org/jira/browse/MAHOUT-1653
> Project: Mahout
> Issue Type: Dependency upgrade
> Reporter: Andrew Musselman
> Assignee: Andrew Palumbo
> Fix For: 0.11.0
>
>
> Support Spark 1.3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)