Happy new year all!

Like the idea to add ML module with Flink.

As I have mentioned to Kostas, Stephan, and Robert before, I would
love to see if we could work with H20 project [1], and it seemed like
the community has added support for it for Apache Mahout backend
binding [2].

So we might get some additional scale ML algos like deep learning.

Definitely would love to help with this initiative =)

- Henry

[1] https://github.com/h2oai/h2o-dev
[2] https://issues.apache.org/jira/browse/MAHOUT-1500

On Fri, Jan 2, 2015 at 6:46 AM, Stephan Ewen <se...@apache.org> wrote:
> Hi everyone!
>
> Happy new year, first of all and I hope you had a nice end-of-the-year
> season.
>
> I thought that it is a good time now to officially kick off the creation of
> a library of machine learning algorithms. There are a lot of individual
> artifacts and algorithms floating around which we should consolidate.
>
> The machine-learning library in Flink would stand on two legs:
>
>  - A collection of efficient implementations for common problems and
> algorithms, e.g., Regression (logistic), clustering (k-Means, Canopy),
> Matrix Factorization (ALS), ...
>
>  - An adapter to the linear algebra DSL in Apache Mahout.
>
> In the long run, it would be the goal to be able to mix and match code from
> both parts.
> The linear algebra DSL is very convenient when it comes to quickly
> composing an algorithm, or some custom pre- and post-processing steps.
> For some complex algorithms, however, a low level system specific
> implementation is necessary to make the algorithm efficient.
> Being able to call the tailored algorithms from the DSL would combine the
> benefits.
>
>
> As a concrete initial step, I suggest to do the following:
>
> 1) We create a dedicated maven sub-project for that ML library
> (flink-lib-ml). The project gets two sub-projects, one for the collection
> of specialized algorithms, one for the Mahout DSL
>
> 2) We add the code for the existing specialized algorithms. As followup
> work, we need to consolidate data types between those algorithms, to ensure
> that they can easily be combined/chained.
>
> 3) The code for the Flink bindings to the Mahout DSL will actually reside
> in the Mahout project, which we need to add as a dependency to flink-lib-ml.
>
> 4) We add some examples of Mahout DSL algorithms, and a template how to use
> them within Flink programs.
>
> 5) Create a good introductory readme.md, outlining this structure. The
> readme can also track the implemented algorithms and the ones we put on the
> roadmap.
>
>
> Comments welcome :-)
>
>
> Greetings,
> Stephan

Reply via email to