[jira] [Assigned] (KAFKA-12523) Need to improve handling of TimeoutException when committing offsets

A. Sophie Blee-Goldman (Jira) Mon, 22 Mar 2021 20:42:14 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


A. Sophie Blee-Goldman reassigned KAFKA-12523:
----------------------------------------------

    Assignee: A. Sophie Blee-Goldman

> Need to improve handling of TimeoutException when committing offsets
> --------------------------------------------------------------------
>
>                 Key: KAFKA-12523
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12523
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.8.0
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: A. Sophie Blee-Goldman
>            Priority: Blocker
>             Fix For: 2.8.0
>
>
> Right now, in TaskManager#commitOffsetsOrTransaction if we  catch a 
> TimeoutException then under ALOS we just rethrow it while in EOS we rethrow 
> it as TaskCorruptedException. The problem is that commitOffsetsOrTransaction 
> can be invoked from several places:
> # Commit within StreamThread main processing loop (either user requested or 
> commit interval has elapsed: this is presumably the case we had in mind when 
> deciding how to handle the TimeoutException in commitOffsetsOrTransaction , 
> no problem here
> # Clean shutdown of application: a bit weird to throw a 
> TaskCorruptedException in this case, but it’ll just end up being caught and 
> forcing a closeDirty, so again no problem here
> # From TaskManager#handleRevocation: in this case, it’s possible we hit a 
> TimeoutException on a task that’s actually being revoked. This exception will 
> be saved and rethrown from poll, so under EOS we would catch a 
> TaskCorruptedException and then try to revive this task that we actually no 
> longer own. Pretty sure this will cause an NPE in the TaskManager. Under 
> ALOS, the rethrown TimeoutException will be bubbled up through poll again, 
> but unlike TaskCorruptedException we actually don’t catch TimeoutException 
> anywhere in the StreamThread loop. This will trigger the uncaught exception 
> handler
> # From TaskManager#handleTaskCorrupted:  this method is itself invoked from 
> within the catch TaskCorruptedException block of the StreamThread’s runLoop. 
> If we throw TaskCorruptedException again then I believe we won’t even catch 
> this in the safety net catch Throwable block of the runLoop -- it’ll just be 
> thrown directly up through run(). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (KAFKA-12523) Need to improve handling of TimeoutException when committing offsets

Reply via email to