wayneli-vt commented on PR #8072: URL: https://github.com/apache/paimon/pull/8072#issuecomment-4600809719
> I found one blocker in the newly added test. > > `BinPackingSplitsTest` fails on this PR head: > > ```shell > mvn -pl paimon-spark/paimon-spark-ut -am -Pfast-build -DfailIfNoTests=false \ > -DwildcardSuites=org.apache.paimon.spark.BinPackingSplitsTest -Dtest=none test > ``` > > The failure is in `Paimon: pack data evolution splits by split granularity`: > > ``` > expected: <2> but was: <3> > at BinPackingSplitsTest.scala:90 > ``` > > The test comment assumes each split is 60B and two splits fit into the 150B target, but `computeMaxSplitBytes` also derives the effective max split size from `SplitUtils.splitSize + fileCount * openCost` and `spark.sql.files.minPartitionNum` / default parallelism. With the current config, the effective max size is smaller than 150B, so every split stays in its own partition. > > Could you please either set `spark.sql.files.minPartitionNum -> "1"` in this test or adjust the test sizes/expectation so the suite passes reliably? Thanks for your review! I updated the test to explicitly set `spark.sql.files.minPartitionNum` to `1`. Since `computeMaxSplitBytes` prefers `filesMinPartitionNum` over `leafNodeDefaultParallelism`, setting the former makes the expected effective max split size deterministic. I also updated the related configurations in `BinPackingSplitsTest` for consistency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
