[ 
https://issues.apache.org/jira/browse/FLINK-36682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896979#comment-17896979
 ] 

Runkang He commented on FLINK-36682:
------------------------------------

I agree with your proposal. In most use case, the mysql primary key is 
auto-increment id, if the insert QPS is high in the snapshot phase, last split 
may contains too much records. Besides, this problem may also occur in mongo 
cdc, because the mongodb's _id field is generated by current timestamp, so 
basically it can be considered as ascending order.

> Add split assign strategy to avoid OOM error in TaskManager
> -----------------------------------------------------------
>
>                 Key: FLINK-36682
>                 URL: https://issues.apache.org/jira/browse/FLINK-36682
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.3.0
>            Reporter: Yanquan Lv
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: cdc-3.3.0
>
>
> During snapshot reading phase, we will split table into chunks and assign 
> them to split reader in TaskManager.
> For evenly chunk split, them are assigned in ascending order. For example, a 
> table that primary key is id may be split into chunks like [-∞, 10000), 
> [10000,20000), [20000,30000), ......[1500000, +∞). However, during snapshot 
> reading phase, more records may be inserted and id will increase to relative 
> high, and the last split may need to fetch too many records, for example, the 
> last split may need to fetch records in range [1500000, 3000000], witch will 
> cause TaskManager out of memory.
> So I propose to add a strategy to allow user to config how to assign split, 
> and by default, we can send the last split first to split reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to