[
https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504664#comment-14504664
]
ASF GitHub Bot commented on FLINK-1807:
---------------------------------------
GitHub user thvasilo opened a pull request:
https://github.com/apache/flink/pull/613
[WIP] - [FLINK-1807/1889] - Optimization frame work and initial SGD
implementation
This is a WIP PR for the optimization framework of the Flink ML library.
The design is a mix between how sklearn and Apache Spark implement their
learning algorithm optimization frameworks.
The idea is that a Learner can take a Solver, LossFunction and
RegularizationType as parameters, similar to the design that sklearn uses and
Spark seems to be headed to. This allows for flexibility on how users design
their learning algorithms.
A Solver uses the LossFunction and RegularizationType in order to optimize
the weights according to the provided DataSet of LabeledVector (label,
featuresVector).
As you will see in the TODOs there are many questions regarding the design
yet, and no real RegularizationType has been implemented yet so that interface
could change depending on what we end up needing for the regularization
calculation.
A first implementation of Stochastic Gradient Descent is included. As you
will see, the stochastic part is still missing as we are blocked on a sample
operator for DataSet. Instead we have to map over the whole data.
If you run the tests you will see that the third test where we try to
perform just one step of the optimization does not work. I haven't been able to
figure out why this happens yet, any help would be appreciated.
I've also included a wrapper for BLAS functions that was copied directly
from MLlib.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/thvasilo/flink optimization
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/613.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #613
----
commit 1ed6032b6505488549785ff38b5805586a0465cb
Author: Theodore Vasiloudis <[email protected]>
Date: 2015-04-21T08:59:34Z
Interfaces for the optimization framework.
BLAS.scala was directly copied from the Apache Spark project.
commit 5a40f14790fd024fdd9a01069262627cda2126a4
Author: Theodore Vasiloudis <[email protected]>
Date: 2015-04-21T09:01:50Z
Added Stochastic Gradient Descent initial version and some tests.
----
> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>
> Key: FLINK-1807
> URL: https://issues.apache.org/jira/browse/FLINK-1807
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Theodore Vasiloudis
> Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in
> different ML algorithms. Thus, it would be helpful to provide a generalized
> SGD implementation which can be instantiated with the respective gradient
> computation. Such a building block would make the development of future
> algorithms easier.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)