ggjh-159 opened a new pull request, #12304:
URL: https://github.com/apache/gluten/pull/12304

   ## What changes are proposed in this pull request?
   
   fix #12298 
   depends on: 
[bigo-sg/velox4j#36](https://github.com/bigo-sg/velox4j/pull/36)、[bigo-sg/velox#45](https://github.com/bigo-sg/velox/pull/45)
   
   This PR consumes the `ParallelSplit` abstraction introduced by the companion 
velox4j PR:
   
   - `NexmarkSourceFactory.buildVeloxSource` iterates over all per-subtask 
splits from `getSplits(parallelism)` and packs them into a single 
`NexmarkConnectorSplit` whose `subtaskSplits` carries one entry per subtask.
   - `GlutenSourceFunction.initSession` detects `ParallelSplit` with 
`parallelism > 1` and selects the per-subtask split via 
`getSubtaskSplit(subtaskIndex, totalParallelism)`; otherwise it behaves as 
before.
   
   ## How was this patch tested?
   
   Manual run on a standalone Flink cluster with `parallelism.default = 2`, 
nexmark `events.num = 10000, tps = 2000`, query q0.
   
   | Signal | Before | After |
   |---|---|---|
   | Bid input rows | ~18400 (duplicated) | 9200 (no duplicates) |
   | `dateTime` span | single timestamp | ~5 seconds |
   | Subtask splits | both share full range | subtask 0 `firstEventId=1, 
maxEvents=5000`; subtask 1 `firstEventId=5001, maxEvents=5000` |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to