sunithabeeram opened a new issue #4070: Handle errors in consuming from streams consistent across: especially across HL and LLRealtimeSegmentDataManagers URL: https://github.com/apache/incubator-pinot/issues/4070 We can have the errors in the following phases during realtime consumption: - Reading a record (or a batch of them) from the stream - Transforming the generic row - Indexing the row Currently, in LLRealtimeSegmentDataManager, if we fail to read/consume from the stream either due to permanent error or exhausting the number of retries on transient errors, the data-manager is put in error state and consumption stops. HLRealtimeSegmentDataManager swallows the exception. Both managers skip over any errors during transformation and indexing. In HLRealtimeSegmentDataManager these failures are counted, but won't get reported until the segment is deemed "full" (which won't be satisfied due to transform/index failures) - thus won't get detected. In LLRealtimeSegmentDataManager, after a consumeLoop() we emit metrics about the failures. It might be better to have better defined handling of these errors. Specifically: - On failure to read from stream, put the manager in error state. - On failure to transform or index row - keep track of the count of such failures and if more than a threshold number of failures are encountered, stop consumption. These should be rare occurences, so stopping consumption would help understand issues better than proceeding silently (and potentially serving incomplete/partial data).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
