[ 
https://issues.apache.org/jira/browse/FLINK-38782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067985#comment-18067985
 ] 

Joekwal commented on FLINK-38782:
---------------------------------

Hi [~chengcong] !

Thanks for the detailed explanation. Understood that errors.tolerance=all could 
be risky due to potential data loss on changeStreamNotValid errors.

What we actually need is not to skip the entire document, but to exclude 
specific oversized fields at the source level — something like 
`exclude.fields=fieldA,fieldB`. This way we can safely bypass the 16MB limit 
without losing the entire document or risking silent data loss on 
changeStreamNotValid.

> Mongodb CDC need a config to filter large size column
> -----------------------------------------------------
>
>                 Key: FLINK-38782
>                 URL: https://issues.apache.org/jira/browse/FLINK-38782
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>            Reporter: Joekwal
>            Priority: Blocker
>
> An error occured:
> {color:#ff3333}com.mongodb.MongoQueryException: Query failed with error code 
> 10334 and error message 'Executor error during getMore :: caused by :: 
> BSONObj size: 27661090 (0x1A61322) is invalid. Size must be between 0 and 
> 16793600(16MB) First element: _id: { _data: "xxx{color}" }' on server
> I'v solved the source problem in mongo, but maybe the developers need a 
> config to skip the large size columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to