MisterRaindrop commented on PR #1571:
URL: https://github.com/apache/cloudberry/pull/1571#issuecomment-3950109069

   > > Are the variables ParallelWorkerNumberOfSlice and 
TotalParallelWorkerNumberOfSlice stable and reliable under the current CBDB 
parallel framework?
   > 
   > Yes, they are stable and reliable under the current CBDB parallel 
framework. But I'm not sure how you plan to use them.
   > 
   > > During execution, the FDW calculates the virtual segment ID based on 
these two values, modifies the HTTP header sent to PXF, and allows PXF's 
round-robin sharding mechanism to automatically distribute data evenly among 
all gang workers.
   > 
   > I'm not entirely sure I follow — isn't this essentially how MPP PXF works 
today? except `the virtual segment ID based on these two values` -- not sure, 
off the hand I think it's not enough, different Slice on same Segment could 
have same parallel workers.
   
   Yes, essentially, it reuses the existing MPP round-robin sharding mechanism 
of PXF—by modifying the segment ID/count in the HTTP header, PXF can distribute 
data to N×W gang workers instead of N physical segments. No changes are 
required on the PXF server side.
   
   Regarding ParallelWorkerNumberOfSlice: From the assignment logic in 
parallel.c, workers on the same segment are assigned incrementally via DSM 
entry (0, 1, 2, ...), which should be unique. However, I want to confirm: In 
the CBDB parallel framework, is this value guaranteed to be unique within the 
same slice on the same segment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to