[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink

Till Rohrmann (JIRA) Mon, 09 Mar 2015 07:10:37 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353005#comment-14353005
 ]


Till Rohrmann commented on FLINK-1537:
--------------------------------------

Hi Sachin,
great to hear that you're interested in working on Flink's machine learning 
library. Since the work on the ML library just recently started, there still a 
lot of create leeway. Depending on your interests, I'm sure that we can find an 
appropriate topic. There are problems which are more related to efficiently 
implementing algorithms with Flink and others where one has to work more on the 
system-side. 

It is very good that you have already gathered some experience with distributed 
systems and even better that you also implemented a distributed random forests 
algorithm. But still, I'd recommend to familiarise yourself a little bit with 
the system by reading the 
[documentation|http://ci.apache.org/projects/flink/flink-docs-master/programming_guide.html],
 going through the example jobs contained in the 
[repository|https://github.com/apache/flink/tree/master/flink-examples] and 
maybe even try to implement one job yourself. That is the best way to 
understand Flink.

Next it would be awesome if you could implement the random forest algorithm 
with Flink. That could be your first contribution to the project. That way, the 
rest of the community will get to know you and you can see if you have fun 
being part of the community. Since you already implemented the job on Hadoop, 
it should not be too difficult for you to also implement in Flink. But be aware 
that Flink offers a richer API than Hadoop and, thus, some things can be done 
in a different way.

In the next days, I'll merge the current state of the machine learning library 
to the master. You can find the current version in my private 
[branch|https://github.com/tillrohrmann/flink/tree/flink-ml]. Would be great if 
you could stick to the paradigm of {{Transformer}} and {{Learner}}.

So just tell me, what if you have a specific topic you'd like to work on.

> GSoC project: Machine learning with Apache Flink
> ------------------------------------------------
>
>                 Key: FLINK-1537
>                 URL: https://issues.apache.org/jira/browse/FLINK-1537
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Till Rohrmann
>            Priority: Minor
>              Labels: gsoc2015, java, machine_learning, scala
>
> Currently, the Flink community is setting up the infrastructure for a machine 
> learning library for Flink. The goal is to provide a set of highly optimized 
> ML algorithms and to offer a high level linear algebra abstraction to easily 
> do data pre- and post-processing. By defining a set of commonly used data 
> structures on which the algorithms work it will be possible to define complex 
> processing pipelines. 
> The Mahout DSL constitutes a good fit to be used as the linear algebra 
> language in Flink. It has to be evaluated which means have to be provided to 
> allow an easy transition between the high level abstraction and the optimized 
> algorithms.
> The machine learning library offers multiple starting points for a GSoC 
> project. Amongst others, the following projects are conceivable.
> * Extension of Flink's machine learning library by additional ML algorithms
> ** Stochastic gradient descent
> ** Distributed dual coordinate ascent
> ** SVM
> ** Gaussian mixture EM
> ** DecisionTrees
> ** ...
> * Integration of Flink with the Mahout DSL to support a high level linear 
> algebra abstraction
> * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
> learning algorithms
> * Implementation of a parameter server like distributed global state storage 
> facility for Flink. This also includes the extension of Flink to support 
> asynchronous iterations and update messages.
> Own ideas for a possible contribution on the field of the machine learning 
> library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink

Reply via email to