wzx140 opened a new pull request, #8050:
URL: https://github.com/apache/paimon/pull/8050

   ### Purpose
   
   ReadonlyTable default implementation creates ReadBuilderImpl(this), which 
makes the ReadBuilder capture KnownSplitsTable. KnownSplitsTable contains all 
known splits, but the ReadBuilder returned from it should not carry all of 
those splits. Therefore, KnownSplitsTable delegates newReadBuilder to the 
origin table.
   
   This issue can happen when merging into a data-evolution table. The Spark 
reader factory holds the ReadBuilder, ReadBuilderImpl holds the 
KnownSplitsTable through its table field, and KnownSplitsTable holds the full 
split list. As a result, each Spark task can become very large because it 
indirectly serializes all known splits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to