[
https://issues.apache.org/jira/browse/KAFKA-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A. Sophie Blee-Goldman reassigned KAFKA-12523:
----------------------------------------------
Assignee: A. Sophie Blee-Goldman
> Need to improve handling of TimeoutException when committing offsets
> --------------------------------------------------------------------
>
> Key: KAFKA-12523
> URL: https://issues.apache.org/jira/browse/KAFKA-12523
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.8.0
> Reporter: A. Sophie Blee-Goldman
> Assignee: A. Sophie Blee-Goldman
> Priority: Blocker
> Fix For: 2.8.0
>
>
> Right now, in TaskManager#commitOffsetsOrTransaction if we catch a
> TimeoutException then under ALOS we just rethrow it while in EOS we rethrow
> it as TaskCorruptedException. The problem is that commitOffsetsOrTransaction
> can be invoked from several places:
> # Commit within StreamThread main processing loop (either user requested or
> commit interval has elapsed: this is presumably the case we had in mind when
> deciding how to handle the TimeoutException in commitOffsetsOrTransaction ,
> no problem here
> # Clean shutdown of application: a bit weird to throw a
> TaskCorruptedException in this case, but it’ll just end up being caught and
> forcing a closeDirty, so again no problem here
> # From TaskManager#handleRevocation: in this case, it’s possible we hit a
> TimeoutException on a task that’s actually being revoked. This exception will
> be saved and rethrown from poll, so under EOS we would catch a
> TaskCorruptedException and then try to revive this task that we actually no
> longer own. Pretty sure this will cause an NPE in the TaskManager. Under
> ALOS, the rethrown TimeoutException will be bubbled up through poll again,
> but unlike TaskCorruptedException we actually don’t catch TimeoutException
> anywhere in the StreamThread loop. This will trigger the uncaught exception
> handler
> # From TaskManager#handleTaskCorrupted: this method is itself invoked from
> within the catch TaskCorruptedException block of the StreamThread’s runLoop.
> If we throw TaskCorruptedException again then I believe we won’t even catch
> this in the safety net catch Throwable block of the runLoop -- it’ll just be
> thrown directly up through run().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)