dclim commented on issue #7428: Add errors and state to stream supervisor 
status API endpoint
URL: https://github.com/apache/incubator-druid/pull/7428#issuecomment-489297940
 
 
   In `128edad` I made some modifications to the implementation along two main 
lines:
   
   1) After some consideration, I felt to remove the whole concept of 
classifying exceptions by their transience as suggested in 
https://github.com/apache/incubator-druid/pull/7428#discussion_r277116304. I 
think it added more complexity than value, but more important could mislead 
users when we incorrectly classify an error as being transient but in reality 
it will never recover without user intervention. Some examples: in Kafka, 
`TimeoutException` gets classified as 'transient' if we've previously had a 
successful run, but without knowing why the timeout is happening, how could you 
know if it would ever resolve? Is it because the network was congested 
momentarily, or is it because the Kafka broker got zapped by lightning and is 
now a smoldering pile of ashes? In Kinesis, the generic 
`AmazonKinesisException` gets classified as 'non-transient', but I can bet that 
there is, or if not in a future release will be, a subclass exception that is 
actually a transient failure that we haven't accounted for because it hasn't 
been written yet. Bottom line is that it's fragile to try to classify 
exceptions, so better not try. 
   
   2) In trying to resolve the issue mentioned in 
https://github.com/apache/incubator-druid/pull/7428#discussion_r278358074 + 
removing of the transience concept in 1), 
`SeekableStreamSupervisorStateManager` was fairly heavily modified from the 
original implementation. Most of the other files remain largely the same. I 
added some missed state capture points in `SeekableStreamSupervisor` and 
removed some that were capturing failures in non-run loop code blocks (e.g. I 
don't want the supervisor reporting an unhealthy state if someone repeatedly 
hits a status endpoint with a bad request but the main loop is fine).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to