shauryachats commented on PR #18433:
URL: https://github.com/apache/pinot/pull/18433#issuecomment-4579885435

   > I do not understand this example and the argument. The partitions 0 and 
10000 belong to two different streams. Although both of them is the partition 0 
of their streams, why do they need to be colocated? 
   
   Good question - let me clarify the colocation argument.
     
     In this specific setup, both streams are co-partitioned by the same key 
(trace_id). That means stream 0 partition 0 and
     stream 1 partition 0 contain data for the same set of trace IDs (those 
where trace_id % 3 == 0). Colocating them on the
     same server means a query filtering by a specific trace_id can be served 
entirely locally without scatter-gathering
     across multiple server groups.
   
     That said, you're right that if the two streams have no relationship 
between their partition keys, colocation across
     streams wouldn't be a requirement.
   
     But more fundamentally, the fix is necessary for correctness of instance 
assignment even independent of the colocation argument. With numPartitions: 3 
configured, the intent is:
     - stream partition 0 → instance group 0
     - stream partition 1 → instance group 1
     - stream partition 2 → instance group 2
     
     Without the fix, stream 1's segments get assigned via the raw Pinot 
partition ID:
     - 10000 % 3 = 1 → instance group 1 (wrong, should be 0)
     - 10001 % 3 = 2 → instance group 2 (wrong, should be 1)
     - 10002 % 3 = 0 → instance group 0 (wrong, should be 2)
     
     This produces an arbitrary and scrambled mapping that doesn't match what 
the user configured at all — segments from stream 1 would be distributed across 
servers in a way that's inconsistent with the replicaGroupPartitionConfig. The 
fix ensures both streams use their stream-level partition ID consistently when 
computing the instance group.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to