[ 
https://issues.apache.org/jira/browse/MAHOUT-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated MAHOUT-1570:
----------------------------------
    Description: 
With the finalized abstraction of the Mahout DSL plans from the backend 
operations (MAHOUT-1529), it should be possible to integrate further backends 
for the Mahout DSL. Apache Flink would be a suitable candidate to act as a good 
execution backend. 

With respect to the implementation, the biggest difference between Spark and 
Flink at the moment is probably the incremental rollout of plans, which is 
triggered by Spark's actions and which is not supported by Flink yet. However, 
the Flink community is working on this issue. For the moment, it should be 
possible to circumvent this problem by writing intermediate results required by 
an action to HDFS and reading from there.

  was:
With the finalized abstraction of logical Mahout DSL plans from the backend 
operations (MAHOUT-1529), it should be possible to integrate further backends 
for the Mahout DSL.

I like to evaluate to what extent this can already be done for Stratosphere and 
what can be done to solve possibly occuring problems. 

The biggest difference between Spark and Stratosphere at the moment is probably 
the incremental rollout of plans, which is triggered by Spark's actions and 
which is not supported by Stratosphere yet. However, the Stratosphere team is 
working on this issue. For the moment, it should be possible to circumvent this 
problem by writing intermediate results required by an action to HDFS and 
reading from there.

Thus, this work shall rather be considered as a proof of concept than a 
strongly efficient implementation and has the purpose to evaluate where the 
logical plan abstraction might be refined in order to support different 
backends. 


> Adding support for Apache Flink as a backend for the Mahout DSL
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1570
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1570
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Sebastian Schelter
>              Labels: DSL, flink, scala
>
> With the finalized abstraction of the Mahout DSL plans from the backend 
> operations (MAHOUT-1529), it should be possible to integrate further backends 
> for the Mahout DSL. Apache Flink would be a suitable candidate to act as a 
> good execution backend. 
> With respect to the implementation, the biggest difference between Spark and 
> Flink at the moment is probably the incremental rollout of plans, which is 
> triggered by Spark's actions and which is not supported by Flink yet. 
> However, the Flink community is working on this issue. For the moment, it 
> should be possible to circumvent this problem by writing intermediate results 
> required by an action to HDFS and reading from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to