[ 
https://issues.apache.org/jira/browse/MAHOUT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604310#comment-14604310
 ] 

ASF GitHub Bot commented on MAHOUT-1653:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/136#discussion_r33415160
  
    --- Diff: 
spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala
 ---
    @@ -165,7 +168,14 @@ class CheckpointedDrmSpark[K: ClassTag](
           else if (classOf[Writable].isAssignableFrom(ktag.runtimeClass)) (x: 
K) => x.asInstanceOf[Writable]
           else throw new IllegalArgumentException("Do not know how to convert 
class tag %s to Writable.".format(ktag))
     
    -    rdd.saveAsSequenceFile(path)
    --- End diff --
    
    It seems that the solution could be to map the DrmRdd keys and values to 
their respective Writables before calling `.saveAsSequenceFile(path)`.    We 
can use the k2wFunc for the Key, and the value will always be a Vector for Drms 
so simply map to a VectorWritable.     This is essentially what is done 
internally in `.saveAsSequenceFile` for standard Writables.
    ```
        // convert RDD to Writables
        rddInput.toDrmRdd().map(x => (k2wFunc(x._1), v2w(x._2)) )
                           .saveAsSequenceFile(path)
    ```
    
    However the above gives a runtime exception since the Writables are not 
Java Serializable: ```org.apache.spark.SparkException: Task not 
serializable```.  Any thoughts on how to do this correctly? 


> Spark 1.3
> ---------
>
>                 Key: MAHOUT-1653
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1653
>             Project: Mahout
>          Issue Type: Dependency upgrade
>            Reporter: Andrew Musselman
>            Assignee: Andrew Palumbo
>             Fix For: 0.11.0
>
>
> Support Spark 1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to