ankitsultana opened a new issue, #12397:
URL: https://github.com/apache/pinot/issues/12397

   I haven't spent time to reproduce this particular issue yet and the 
following is based on my understanding, so do evaluate and contest any claims 
made herein.
   
   ### Issue Description
   
   When a sorted column is set for any Realtime table, the MutableSegment 
remains unsorted. During the segment-commit, we read each record from the 
MutableSegment in [the sortedDocId 
order](https://github.com/apache/pinot/blob/43dadbfd96a70c19a9ac83bb6c0c35f3fa58bffb/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/converter/RealtimeSegmentConverter.java#L122).
   
   A segment commit will be followed by a `addOrReplaceSegment` call in the 
`ConcurrentMapPartitionUpsertMetadataManager` (call it `UMM` for Upsert 
Metadata Manager), where the `oldSegment` will be set to the 
`ImmutableSegment`. This is because you need to update this map so that it 
points to the new segment, and the docId is also updated as needed.
   
   With Partial Upserts, say we had 4 events for a given primary-key in the 
Mutable Segment: `R0, R1, R2, R3`. Let's also assume that the comparison column 
value of these events is: `R0 < R1 < (R2 = R3)`.
   
   If after applying the sorted column, their order changes to: `R0, R1, R3, 
R2`, then the UMM will start pointing to `R2` as the valid doc.
   
   To summarize: If Partial Upsert tables have a sorted column, then the users 
must make sure that their events for a given primary key are emitted in 
strictly increasing order.
   
   You may ask "What about Full Upsert tables?"
   
   This is not an issue for Full Upsert tables in my opinion, because Pinot 
could say that in case of ties of comparison column values, any of the records 
may be picked as the latest record.
   
   For Partial Upsert tables, from a user perspective, the bug above will be 
seen as random events being dropped and not applied to the Partial Upsert 
merger, which would lead to inconsistent data.
   
   ### Discussion
   
   What should be the follow-up here? Some options I see are:
   
   * Option-1: Say this is a known gap and users must make sure that their 
events for a primary key are in strictly increasing order if they want to use a 
sorted column. In that case I think we should at least consider calling this 
out explicitly in the 
[Runbook](https://docs.pinot.apache.org/basics/data-import/upsert#:~:text=When%20two%20records,to%20sort%20by.).
   * Option-2: Add a validation to prevent users from setting a sorted column 
with Partial Upsert tables.
   * Option-3: Update implementation to handle this case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to