[
https://issues.apache.org/jira/browse/FLINK-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dian Fu updated FLINK-18235:
----------------------------
Description: Currently, when a checkpoint is triggered for the Python
operator, all the data buffered will be flushed to the Python worker to be
processed. This will increase the overall checkpoint time in case there are a
lot of elements buffered and Python UDF is slow. We should improve the
checkpoint strategy to improve this. One way is to control the number of data
buffered in the pipeline between Java/Python processes, similar to what
[FLIP-183|https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment]
does to control the number of data buffered in the network. We can also let
users to config the checkpoint strategy if needed. (was: Currently, when a
checkpoint is triggered for the Python operator, all the data buffered will be
flushed to the Python worker to be processed. This will increase the overall
checkpoint time in case there are a lot of elements buffered and Python UDF is
slow. We should improve the checkpoint strategy to improve this, e.g. buffering
the data into state instead of flushing them out. We can also let users to
config the checkpoint strategy if needed.)
> Improve the checkpoint strategy for Python UDF execution
> --------------------------------------------------------
>
> Key: FLINK-18235
> URL: https://issues.apache.org/jira/browse/FLINK-18235
> Project: Flink
> Issue Type: Improvement
> Components: API / Python
> Reporter: Dian Fu
> Priority: Minor
> Labels: auto-deprioritized-major
> Fix For: 1.15.0
>
>
> Currently, when a checkpoint is triggered for the Python operator, all the
> data buffered will be flushed to the Python worker to be processed. This will
> increase the overall checkpoint time in case there are a lot of elements
> buffered and Python UDF is slow. We should improve the checkpoint strategy to
> improve this. One way is to control the number of data buffered in the
> pipeline between Java/Python processes, similar to what
> [FLIP-183|https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment]
> does to control the number of data buffered in the network. We can also let
> users to config the checkpoint strategy if needed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)