[ 
https://issues.apache.org/jira/browse/FLINK-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558559#comment-14558559
 ] 

ASF GitHub Bot commented on FLINK-2053:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/723

    [FLINK-2053] [ml] Adds automatic preregistration of ML types

    Adds automatic type registration of flink-ml types. This is done by 
providing a type registration method `FlinkMLTools.registerFlinkMLTypes` which 
is called from within the `fit`, `predict` and `transform` methods of the 
`Estimator`, `Predictor` and `Transformer`.
    
    Adds de-duplication of registered types at the `ExecutionConfig` by using 
`LinkedHashSet` which maintains the insertion order. 
    
    Fixes bug in `BreezeSparseVector` to `FlinkSparseVector` conversion. 
`BreezeSparseVector` is not always compacted to its maximum and thus leaves 
some array entries unused. Consequently, only parts of the data arrays should 
be given to the `FlinkSparseVector`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink preregisterMLTypes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #723
    
----
commit 483caef1276c80f60bcc6c97836c8008d62ec72b
Author: Till Rohrmann <[email protected]>
Date:   2015-05-25T22:35:05Z

    [FLINK-2053] [ml] Adds automatic type registration of flink-ml types. Adds 
de-duplication of registered types at ExecutionConfig. Fixes bug in Breeze 
SparseVector to Flink SparseVector conversion.

----


> Preregister ML types for Kryo serialization
> -------------------------------------------
>
>                 Key: FLINK-2053
>                 URL: https://issues.apache.org/jira/browse/FLINK-2053
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>              Labels: ML
>             Fix For: 0.9
>
>
> Currently, FlinkML uses interfaces and abstract types to implement generic 
> algorithms. As a consequence we have to use Kryo to serialize the effective 
> subtypes. In order to speed the data transfer up, it's necessary to 
> preregister these types in order to assign them fixed IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to