eaugene commented on issue #13149:
URL: https://github.com/apache/pinot/issues/13149#issuecomment-2110705587
Thanks, @cbalci for drafting this message.
We have faced this issue 2 times since last month
For the first instance of the issue I made the debugging a couple of weeks
back, these are the series of events which triggered the undesirable state of
segments ( I got this from the log dump )
**Setup**
- Contoller ( C )
- Server - S1 , S2 - Both consuming same segment ( X ) from Kafka
- Pinot Version : 0.12
**Series of Events in Increasing Order of Time**
| Time | C | S1 | S2 |
|--------|--------|--------|--------|
| T1 | Segment X has started consuming | - | - |
| T2 | Committing Segment X with Winner S1 | - | - |
| T3 | - | Segment Tar built | - |
| T4 | - | Failure to Upload To DeepStore | - |
| T5 | Updated Segment Metadata And Idealstates are set for this segment | -
| - |
| T6 | - | - | SegmentOnlineOfflineStateModel.onBecomeOnlineFromConsuming()
got called . This internally tried to catchup till the offset of the segment X
, but failed due to timeout . So it tried to download from peer |
| T7 | | |Failed to download segment X after retries.<br>Failure in getting
online servers for segment table_X<br>This peer download has exponential retry
policy , but it could not succeed as in external view it could not find an
Online copy<br>As earlier controller has only set in Ideal state ( at T2 ) and
it has not transitioned into External state<br>The segment goes into error
state in this node( added to Error Cache ) , but still has mmap files in
consumer dir<br>Log:Caught exception in state transition CONSUMING -> ONLINE
for table: table_X, segment: X<br> There's a final clause which is triggered to
release segments
https://github.com/apache/pinot/blob/8a4f7a0ea7324039327a28257e60899caadc594a/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentOnlineOfflineStateModelFactory.java#L120
but that only gets executed when there’s no reference on the segment. There
could be a reference to the segment if that had a query processing |
| T8 | Reload segment call from Controller to both servers | - | - |
| T9 | - | Reload is success as it has local copy |Reloading (force
committing) consuming segment: X in table: table_X<br>This force only sets the
RealtImeSegmentDataManager._forceCommitMessageReceived as true .<br>The segment
is still in Error in this instance.Till Here we have mmap files in consumer
dir|
| T10 |Reset segment X on S2 | - | - |
| T11 | - | - |Skipping adding existing segment: X for table: table_X with
data manager class: RealtimeSegmentDataManager<br>But this changes the state to
ONLINE <br>SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline() is
called<br>Since the segment is Online in External view , this gave a false view
that the segment was successfully reloaded from peer and being served from
there. <br>But in reality , the segment was still being served the mmap files .
As the _segmentDataManagerMap would have still had a mutable segment for this .
<br> From the code , only chances when the _segmentDataManagerMap removes a
segment are CONSUMING to OFFLINE,CONSUMING to DROPPED ,ONLINE to OFFLINE,
ONLINE to DROPPED . But in our case the state transitions were only on
SegmentOnlineOfflineStateModel.onBecomeOfflineFromError() and
SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline()<br>Offline to Error
is just logging (
https://github.com/apache/pinot/blob/8a4f7a0ea7324039327a28257e60899caa
dc594a/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentOnlineOfflineStateModelFactory.java#L221
)<br>If we would have dropped segment X from _segmentDataManagerMap , the
future reset call would have peer downloaded the segment|
***This is how the segment ended in such undesirable state in S2* .**
we ended up losing the segment entirely as S1 container was replaced before
it even uploaded to Deepstore and S2 container was restarted
(_segmentDataManagerMap in memory map getting cleared ) .
We are aware that Deepstore upload retry could have solved data loss .
Some improvements found during this debugging
- To prevent the segment in such state , we can flush the segment from
_segmentDataManagerMap in
SegmentOnlineOfflineStateModel.onBecomeOfflineFromError() tranition
- To make pinot serve the queries from mmap of the sealed segment , can we
persist the _segmentDataManagerMap ( to ZK possibly ) , so even if the node is
restarted we can still server
- Not sure if converting mmap consumer files to a immutable segment is
possible with pinot currently - Looking to see if we can have this ?
Looking forward to see if any-others have faced a similar kind of issue and
open to hear about improvements and additional guardrails we can setup to
prevent such occurrence
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]