[jira] [Updated] (FLINK-18235) Improve the checkpoint strategy for Python UDF execution

Dian Fu (Jira) Wed, 13 Oct 2021 00:53:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dian Fu updated FLINK-18235:
----------------------------
    Description: Currently, when a checkpoint is triggered for the Python 
operator, all the data buffered will be flushed to the Python worker to be 
processed. This will increase the overall checkpoint time in case there are a 
lot of elements buffered and Python UDF is slow. We should improve the 
checkpoint strategy to improve this. One way is to control the number of data 
buffered in the pipeline between Java/Python processes, similar to what 
[FLIP-183|https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment]
 does to control the number of data buffered in the network. We can also let 
users to config the checkpoint strategy if needed.  (was: Currently, when a 
checkpoint is triggered for the Python operator, all the data buffered will be 
flushed to the Python worker to be processed. This will increase the overall 
checkpoint time in case there are a lot of elements buffered and Python UDF is 
slow. We should improve the checkpoint strategy to improve this, e.g. buffering 
the data into state instead of flushing them out. We can also let users to 
config the checkpoint strategy if needed.)

> Improve the checkpoint strategy for Python UDF execution
> --------------------------------------------------------
>
>                 Key: FLINK-18235
>                 URL: https://issues.apache.org/jira/browse/FLINK-18235
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Python
>            Reporter: Dian Fu
>            Priority: Minor
>              Labels: auto-deprioritized-major
>             Fix For: 1.15.0
>
>
> Currently, when a checkpoint is triggered for the Python operator, all the 
> data buffered will be flushed to the Python worker to be processed. This will 
> increase the overall checkpoint time in case there are a lot of elements 
> buffered and Python UDF is slow. We should improve the checkpoint strategy to 
> improve this. One way is to control the number of data buffered in the 
> pipeline between Java/Python processes, similar to what 
> [FLIP-183|https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment]
>  does to control the number of data buffered in the network. We can also let 
> users to config the checkpoint strategy if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18235) Improve the checkpoint strategy for Python UDF execution

Reply via email to