Re: [PR] [spark] Support split-granularity bin packing for data evolution tables [paimon]

via GitHub Tue, 02 Jun 2026 02:08:45 -0700


wayneli-vt commented on PR #8072:
URL: https://github.com/apache/paimon/pull/8072#issuecomment-4600809719


   > I found one blocker in the newly added test.
   > 
   > `BinPackingSplitsTest` fails on this PR head:
   > 
   > ```shell
   > mvn -pl paimon-spark/paimon-spark-ut -am -Pfast-build 
-DfailIfNoTests=false \
   >   -DwildcardSuites=org.apache.paimon.spark.BinPackingSplitsTest 
-Dtest=none test
   > ```
   > 
   > The failure is in `Paimon: pack data evolution splits by split 
granularity`:
   > 
   > ```
   > expected: <2> but was: <3>
   > at BinPackingSplitsTest.scala:90
   > ```
   > 
   > The test comment assumes each split is 60B and two splits fit into the 
150B target, but `computeMaxSplitBytes` also derives the effective max split 
size from `SplitUtils.splitSize + fileCount * openCost` and 
`spark.sql.files.minPartitionNum` / default parallelism. With the current 
config, the effective max size is smaller than 150B, so every split stays in 
its own partition.
   > 
   > Could you please either set `spark.sql.files.minPartitionNum -> "1"` in 
this test or adjust the test sizes/expectation so the suite passes reliably?
   
   Thanks for your review! I updated the test to explicitly set 
`spark.sql.files.minPartitionNum` to `1`.
   
   Since `computeMaxSplitBytes` prefers `filesMinPartitionNum` over 
`leafNodeDefaultParallelism`, setting the former makes the expected effective 
max split size deterministic. I also updated the related configurations in 
`BinPackingSplitsTest` for consistency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [spark] Support split-granularity bin packing for data evolution tables [paimon]

Reply via email to