[ 
https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504664#comment-14504664
 ] 

ASF GitHub Bot commented on FLINK-1807:
---------------------------------------

GitHub user thvasilo opened a pull request:

    https://github.com/apache/flink/pull/613

    [WIP] - [FLINK-1807/1889] - Optimization frame work and initial SGD 
implementation

    This is a WIP PR for the optimization framework of the Flink ML library.
    
    The design is a mix between how sklearn and Apache Spark implement their 
learning algorithm optimization frameworks.
    
    The idea is that a Learner can take a Solver, LossFunction and 
RegularizationType as parameters, similar to the design that sklearn uses and 
Spark seems to be headed to. This allows for flexibility on how users design 
their learning algorithms.
    
    A Solver uses the  LossFunction and RegularizationType in order to optimize 
the weights according to the provided DataSet of LabeledVector (label, 
featuresVector).
    
    As you will see in the TODOs there are many questions regarding the design 
yet, and no real RegularizationType has been implemented yet so that interface 
could change depending on what we end up needing for the regularization 
calculation.
    
    A first implementation of Stochastic Gradient Descent is included. As you 
will see, the stochastic part is still missing as we are blocked on a sample 
operator for DataSet. Instead we have to map over the whole data.
    If you run the tests you will see that the third test where we try to 
perform just one step of the optimization does not work. I haven't been able to 
figure out why this happens yet, any help would be appreciated.
    
    I've also included a wrapper for BLAS functions that was copied directly 
from MLlib.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thvasilo/flink optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #613
    
----
commit 1ed6032b6505488549785ff38b5805586a0465cb
Author: Theodore Vasiloudis <[email protected]>
Date:   2015-04-21T08:59:34Z

    Interfaces for the optimization framework.
    
    BLAS.scala was directly copied from the Apache Spark project.

commit 5a40f14790fd024fdd9a01069262627cda2126a4
Author: Theodore Vasiloudis <[email protected]>
Date:   2015-04-21T09:01:50Z

    Added Stochastic Gradient Descent initial version and some tests.

----


> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>
>                 Key: FLINK-1807
>                 URL: https://issues.apache.org/jira/browse/FLINK-1807
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Theodore Vasiloudis
>              Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in 
> different ML algorithms. Thus, it would be helpful to provide a generalized 
> SGD implementation which can be instantiated with the respective gradient 
> computation. Such a building block would make the development of future 
> algorithms easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to