leaves12138 opened a new pull request, #7852:
URL: https://github.com/apache/paimon/pull/7852

   ## What changed
   
   - Reworked Flink BTree global index building to use one task-driven topology 
for all contiguous row ranges instead of building one topology per range.
   - Added an internal build task id to the sort key so each range keeps its 
own row-range metadata while sharing the same Flink source/read/sort/write 
chain.
   - Added coverage for parallelism calculation, many small ranges, and a 
single large range split across multiple writer subtasks.
   
   ## Why
   
   When row ranges are highly fragmented, the old implementation creates a 
separate Flink topology for each range. That can make the create-index 
procedure spend a long time constructing the JobGraph and can produce an 
oversized topology.
   
   ## Validation
   
   - `mvn -pl paimon-flink/paimon-flink-common -DfailIfNoTests=false 
-Dtest=BTreeIndexTopoBuilderTest test`
   - `mvn -pl paimon-flink/paimon-flink-common -Pfast-build 
-DfailIfNoTests=false 
-Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithManyPartitions test`
   - `mvn -pl paimon-flink/paimon-flink-common -Pfast-build 
-DfailIfNoTests=false 
-Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithSingleRangeAndParallelWriters 
test`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to