leaves12138 opened a new pull request, #7852: URL: https://github.com/apache/paimon/pull/7852
## What changed - Reworked Flink BTree global index building to use one task-driven topology for all contiguous row ranges instead of building one topology per range. - Added an internal build task id to the sort key so each range keeps its own row-range metadata while sharing the same Flink source/read/sort/write chain. - Added coverage for parallelism calculation, many small ranges, and a single large range split across multiple writer subtasks. ## Why When row ranges are highly fragmented, the old implementation creates a separate Flink topology for each range. That can make the create-index procedure spend a long time constructing the JobGraph and can produce an oversized topology. ## Validation - `mvn -pl paimon-flink/paimon-flink-common -DfailIfNoTests=false -Dtest=BTreeIndexTopoBuilderTest test` - `mvn -pl paimon-flink/paimon-flink-common -Pfast-build -DfailIfNoTests=false -Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithManyPartitions test` - `mvn -pl paimon-flink/paimon-flink-common -Pfast-build -DfailIfNoTests=false -Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithSingleRangeAndParallelWriters test` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
