Jackie-Jiang commented on PR #17380:
URL: https://github.com/apache/pinot/pull/17380#issuecomment-3667668660

   > > We need to discuss when should we use data CRC instead of index CRC, and 
what is the side effect. When using data CRC, index only change happening in 
the deep store (i.e. new index added) won't be honored. This could prevent 
users from creating the index from minion and reduce the index creation on 
server. Given we want to solve the problem of real-time committed segment 
potentially having different CRC, I feel a better way to address this is to add 
a flag in ZK metadata to indicate that we can check only the data CRC. This 
flag only exists in committed segment, but not segment pushed from other 
ingestion flow
   > 
   > Thanks @Jackie-Jiang , we can do that. For my understanding, in the 
current code, I'm only using Data CRC in `doAddOnlineSegment` of the class 
`OfflineTableDataManager` and `RealtimeTableDataManager` , which are called in 
helix transition states during `onBecomeOnlineFromConsuming` and 
`onBecomeOnlineFromOffline`.
   > 
   > For the other flows of reload segment, replace segments (used by minions), 
data CRC is not used. So, in a way, is the code already handling this point ? 
Or are there other flows that might be accidentally included in this ?
   
   These helix state transitions apply to a lot of scenarios, not only for the 
committed segments. E.g. when server starts, all segments are loaded through 
these 2 state transitions, which will make server ignore changes applied by 
minions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to