[
https://issues.apache.org/jira/browse/FLINK-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558559#comment-14558559
]
ASF GitHub Bot commented on FLINK-2053:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/723
[FLINK-2053] [ml] Adds automatic preregistration of ML types
Adds automatic type registration of flink-ml types. This is done by
providing a type registration method `FlinkMLTools.registerFlinkMLTypes` which
is called from within the `fit`, `predict` and `transform` methods of the
`Estimator`, `Predictor` and `Transformer`.
Adds de-duplication of registered types at the `ExecutionConfig` by using
`LinkedHashSet` which maintains the insertion order.
Fixes bug in `BreezeSparseVector` to `FlinkSparseVector` conversion.
`BreezeSparseVector` is not always compacted to its maximum and thus leaves
some array entries unused. Consequently, only parts of the data arrays should
be given to the `FlinkSparseVector`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink preregisterMLTypes
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/723.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #723
----
commit 483caef1276c80f60bcc6c97836c8008d62ec72b
Author: Till Rohrmann <[email protected]>
Date: 2015-05-25T22:35:05Z
[FLINK-2053] [ml] Adds automatic type registration of flink-ml types. Adds
de-duplication of registered types at ExecutionConfig. Fixes bug in Breeze
SparseVector to Flink SparseVector conversion.
----
> Preregister ML types for Kryo serialization
> -------------------------------------------
>
> Key: FLINK-2053
> URL: https://issues.apache.org/jira/browse/FLINK-2053
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Labels: ML
> Fix For: 0.9
>
>
> Currently, FlinkML uses interfaces and abstract types to implement generic
> algorithms. As a consequence we have to use Kryo to serialize the effective
> subtypes. In order to speed the data transfer up, it's necessary to
> preregister these types in order to assign them fixed IDs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)