[
https://issues.apache.org/jira/browse/FLINK-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhipeng Zhang updated FLINK-27826:
----------------------------------
Summary: Support machine learning training for very high dimesional models
(was: Support machine learning training for high dimesional models)
> Support machine learning training for very high dimesional models
> -----------------------------------------------------------------
>
> Key: FLINK-27826
> URL: https://issues.apache.org/jira/browse/FLINK-27826
> Project: Flink
> Issue Type: New Feature
> Components: Library / Machine Learning
> Reporter: Zhipeng Zhang
> Assignee: Zhipeng Zhang
> Priority: Major
>
> There is limited support for training high dimensional machine learning
> models in FlinkML though it is often useful especially in industrial cases.
> When the size of the model parameter can not be hold in the memory of a
> single machine, FlinkML crashes now.
> So it is useful to support high dimensional model training in FlinkML. To
> achieve this, we probably need to do the following things:
> # Do a survey on how to training large machine learning models of existing
> machine learning systems (e.g. data paralllel, model parallel)
> # Define/Implement the infra of supporting large model training in FlinkML
> # Implement a logistic regression model that can train models with more than
> ten billion parameters
> # Benchmark the implementation and further improve it
--
This message was sent by Atlassian Jira
(v8.20.7#820007)