sunithabeeram opened a new issue #4070: Handle errors in consuming from streams 
consistent across: especially across HL and LLRealtimeSegmentDataManagers
URL: https://github.com/apache/incubator-pinot/issues/4070
 
 
   We can have the errors in the following phases during realtime consumption:
   - Reading a record (or a batch of them) from the stream
   - Transforming the generic row
   - Indexing the row
   
   Currently, in LLRealtimeSegmentDataManager, if we fail to read/consume from 
the stream either due to permanent error or exhausting the number of retries on 
transient errors, the data-manager is put in error state and consumption stops. 
HLRealtimeSegmentDataManager swallows the exception.
   
   Both managers skip over any errors during transformation and indexing. In 
HLRealtimeSegmentDataManager these failures are counted, but won't get reported 
until the segment is deemed "full" (which won't be satisfied due to 
transform/index failures) - thus won't get detected. In 
LLRealtimeSegmentDataManager, after a consumeLoop() we emit metrics about the 
failures.
   
   It might be better to have better defined handling of these errors. 
Specifically:
   - On failure to read from stream, put the manager in error state.
   - On failure to transform or index row - keep track of the count of such 
failures and if more than a threshold number of failures are encountered, stop 
consumption. 
   
   These should be rare occurences, so stopping consumption would help 
understand issues better than proceeding silently (and potentially serving 
incomplete/partial data).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to