wzx140 opened a new pull request, #8050: URL: https://github.com/apache/paimon/pull/8050
### Purpose ReadonlyTable default implementation creates ReadBuilderImpl(this), which makes the ReadBuilder capture KnownSplitsTable. KnownSplitsTable contains all known splits, but the ReadBuilder returned from it should not carry all of those splits. Therefore, KnownSplitsTable delegates newReadBuilder to the origin table. This issue can happen when merging into a data-evolution table. The Spark reader factory holds the ReadBuilder, ReadBuilderImpl holds the KnownSplitsTable through its table field, and KnownSplitsTable holds the full split list. As a result, each Spark task can become very large because it indirectly serializes all known splits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
