[ 
https://issues.apache.org/jira/browse/FLINK-39795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-39795:
-----------------------------------
    Labels: pull-request-available  (was: )

> MongoDB CDC does not use index when reading shard collection chunks
> -------------------------------------------------------------------
>
>                 Key: FLINK-39795
>                 URL: https://issues.apache.org/jira/browse/FLINK-39795
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.6.0
>            Reporter: Xiqian Yu
>            Assignee: Xiqian Yu
>            Priority: Major
>              Labels: pull-request-available
>
> In MongoUtils, we will fully poll `config.chunks` collection to obtain shard 
> collection chunks info, sorted by its $min value. An `$or` expression is 
> added for backwards compatibility with Mongo 4.8 and lower.
>  
> {code:java}
> public static List<BsonDocument> readChunks(
>         MongoClient mongoClient, BsonDocument collectionMetadata) {
>     MongoCollection<BsonDocument> chunks =
>             collectionFor(mongoClient, TableId.parse("config.chunks"), 
> BsonDocument.class);
>     List<BsonDocument> collectionChunks = new ArrayList<>();
>     Bson filter =
>             or(
>                     new BsonDocument(NAMESPACE_FIELD, 
> collectionMetadata.get(ID_FIELD)),
>                     // MongoDB 4.9.0 removed ns field of config.chunks 
> collection, using
>                     // collection's uuid instead.
>                     // See: https://jira.mongodb.org/browse/SERVER-53105
>                     new BsonDocument(UUID_FIELD, 
> collectionMetadata.get(UUID_FIELD)));
>     chunks.find(filter)
>             .projection(include("min", "max", "shard"))
>             .sort(ascending("min"))
>             .into(collectionChunks);
>     return collectionChunks;
> }{code}
> The problem is, with the extra filter, it is not possible to query the sorted 
> view of collection with index acceleration. Chunks will be sorted in memory, 
> causing the following exception:
>  
> {code:java}
> Read config.chunks collection failed: Encountered non-retryable error during 
> query :: caused by :: Executor error during find command :: caused by :: Sort 
> operation used more than the maximum 33554432 bytes of RAM. Add an index, or 
> specify a smaller limit. {code}
> We may check Mongo server version explicitly instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to