tibrewalpratik17 opened a new pull request, #13347:
URL: https://github.com/apache/pinot/pull/13347

   Labels:
   `enhancement`
   `upsert`
   
   ### Problem statement
   
   In #12037, we added support for removing deleted keys and the validDocId for 
the deleted records after a TTL window. This ensured that deleted records would 
eventually be picked up by the compaction flow.
   In #13342, we directly enabled compaction to take care of deleted rows.
   
   In both scenarios, we could end up in a situation where an older non-deleted 
record for a particular key is not compacted, but the deleted record is. During 
a server restart, when all segments are loaded, Pinot would incorrectly mark 
the record as non-deleted and start returning it as a valid primary key, 
leading to an inconsistent state in the table.
   
   **Example**:
   
   Consider a primary key PK1 with records in segments S0 and S1. In S1, the 
record is marked as deleted. If S1 gets compacted but S0 doesn't due to 
threshold reasons in the upsert-compaction flow, during a server restart, the 
upsert-metadata-manager map would incorrectly point PK1 to S0, even though it 
should be considered deleted for the end-user.
   
   **Why not use enablePreload?**
   While enablePreload might prevent loading previous invalid records, it is 
not foolproof. Valid-doc-id snapshots are not always available due to scenarios 
like fetching segments from deepstore or upsert-compaction refreshing the 
segment. If a validDocID snapshot is missing and enablePreload is set to true, 
the segment loads normally, iterating through records and would potentially 
end-up reviving a deleted primary key with an old value with the present flow.
   
   ### Proposal
   
   This PR aims to maintain the state of PK -> distinct-segment-count. This 
means tracking the number of segments in which a record exists for a given 
primary key. If the count is <= 1, we can compact the deleted record, ensuring 
that all other records in other segments are removed.
   
   Changes required in the following upsert methods:
   - `addOrReplaceSegment`: When adding a segment, increment the counter by 1. 
For replacing a segment, the removeSegment flow will decrement the count by 1.
   - `removeSegment`: This method needs to decrement the count by 1 for all 
keys present in the segment, which requires more intrusive changes as we 
currently only iterate valid-doc-id records.
   - `addRecord` : Here, we need to check if the last pointed record is from 
the same consuming segment and increment the value accordingly. This might need 
better checks with `dropOutOfOrder` config enabled.
   
   **Inconsistent with enablePreload**
   
   This proposal might not work with `enablePreload`, as `enablePreload` only 
iterates validDocIDs during segment-addition, ultimately missing counts for 
other invalidDocIDs PKs in the upsert-map. 
   We could enable this proposal behind a new config, 
`enableDeletionConsistencyWithCompaction`, and ensure that this config cannot 
be used simultaneously with `enablePreload`. Additionally, we can use this 
boolean flag in the `removeSegment` flow to decide whether to iterate the 
entire segment or just the valid-doc-ids.
   
   cc @klsince @Jackie-Jiang raising this draft PR for early feedback on the 
proposal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to