[
https://issues.apache.org/jira/browse/MAHOUT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606414#comment-14606414
]
ASF GitHub Bot commented on MAHOUT-1653:
----------------------------------------
Github user andrewpalumbo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/136#discussion_r33514280
--- Diff:
spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala
---
@@ -165,7 +168,14 @@ class CheckpointedDrmSpark[K: ClassTag](
else if (classOf[Writable].isAssignableFrom(ktag.runtimeClass)) (x:
K) => x.asInstanceOf[Writable]
else throw new IllegalArgumentException("Do not know how to convert
class tag %s to Writable.".format(ktag))
- rdd.saveAsSequenceFile(path)
--- End diff --
That is actually using the non-deprecated `.saveAsSequenceFile(path)` I'm
just suggesting that we could skip all of the implicit conversions and we
explicitly map the RDD to Writables ourselves. Then call
`.saveAsSequenceFile(path)` on the RDD of eg. `[IntWritable, VectorWritable]`.
This is actually what Spark does in `.saveAsSequenceFile(path)` :
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala#L97
if either a Key or a Value is not a `Writable`, it converts one or the
other or both to a Writable using eg.: ```self.map(x => (anyToWritable(x._1),
anyToWritable(x._2)))```
and then calls `.saveAsHadoopFile(...)` on the Mapped RDD.
If it detects that both are Writables though as would be the case if we
mapped them explicitly, it simply calls `.saveAsHadoopFile(...)`. So By
mapping them ourselves in `.dfsWrite(...)` we shouldn't incur any additional
overhead.
Actually we may just be able to call `.saveAsHadoopFile(...)` directly on a
mapped => Writable RDD from `.dfsWrite(...)`.
> Spark 1.3
> ---------
>
> Key: MAHOUT-1653
> URL: https://issues.apache.org/jira/browse/MAHOUT-1653
> Project: Mahout
> Issue Type: Dependency upgrade
> Reporter: Andrew Musselman
> Assignee: Andrew Palumbo
> Fix For: 0.11.0
>
>
> Support Spark 1.3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)