[jira] [Commented] (MAHOUT-1570) Adding support for Apache Flink as a backend for the Mahout DSL

ASF GitHub Bot (JIRA) Thu, 22 Oct 2015 06:13:09 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969135#comment-14969135
 ]


ASF GitHub Bot commented on MAHOUT-1570:
----------------------------------------

Github user tillrohrmann commented on the pull request:

    https://github.com/apache/mahout/pull/161#issuecomment-150217075
  
    Well, I just saw that you're using Flink `0.9-SNAPSHOT`. In 
`0.10-SNAPSHOT`, you can register default serializers for Kryo. I just tested 
to bump Flink's version to `0.10-SNAPSHOT` and it worked without problems. If 
you move `VectorKryoSerializer` and `GenericMatrixKryoSerializer` to a module 
which is accessible from the flink bindings module, then adding
    
    ```
    env.addDefaultKryoSerializer(classOf[Vector], new VectorKryoSerializer())
    env.addDefaultKryoSerializer(classOf[Matrix], new 
GenericMatrixKryoSerializer())
    ```
    
    to the constructor of `FlinkDistributedContext` solved the OOM for me. 
Additionally, you have to make both serializers either serializable, otherwise 
Flink cannot ship them.
    
    Alternatively, you can give both serializers a zero arg constructor 
(default values don't count a such a constructor for Java). Then the 
serializers don't have to be serializable and you can register the serializers 
via
    
    ```
    env.addDefaultKryoSerializer(classOf[Vector], classOf[VectorKryoSerializer])
    env.addDefaultKryoSerializer(classOf[Matrix], 
classOf[GenericMatrixKryoSerializer])
    ```


> Adding support for Apache Flink as a backend for the Mahout DSL
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1570
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1570
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Alexey Grigorev
>              Labels: DSL, flink, scala
>             Fix For: 0.11.1
>
>
> With the finalized abstraction of the Mahout DSL plans from the backend 
> operations (MAHOUT-1529), it should be possible to integrate further backends 
> for the Mahout DSL. Apache Flink would be a suitable candidate to act as a 
> good execution backend. 
> With respect to the implementation, the biggest difference between Spark and 
> Flink at the moment is probably the incremental rollout of plans, which is 
> triggered by Spark's actions and which is not supported by Flink yet. 
> However, the Flink community is working on this issue. For the moment, it 
> should be possible to circumvent this problem by writing intermediate results 
> required by an action to HDFS and reading from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1570) Adding support for Apache Flink as a backend for the Mahout DSL

Reply via email to