Zhipeng Zhang created FLINK-27826:
-------------------------------------
Summary: Support machine learning training for high dimesional
models
Key: FLINK-27826
URL: https://issues.apache.org/jira/browse/FLINK-27826
Project: Flink
Issue Type: New Feature
Components: Library / Machine Learning
Reporter: Zhipeng Zhang
Assignee: Zhipeng Zhang
There is limited support for training high dimensional machine learning models
in FlinkML though it is often useful especially in industrial cases. When the
size of the model parameter can not be hold in the memory of a single machine,
FlinkML crashes now.
So it is useful to support high dimensional model training in FlinkML. To
achieve this, we probably need to do the following things:
# Do a survey on how to training large machine learning models of existing
machine learning systems (e.g. data paralllel, model parallel)
# Define/Implement the infra of supporting large model training in FlinkML
# Implement a logistic regression model that can train models with more than
ten billion parameters
# Benchmark the implementation and further improve it
--
This message was sent by Atlassian Jira
(v8.20.7#820007)