[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481998#comment-17481998
 ] 

Randall Hauch edited comment on KAFKA-12879 at 1/25/22, 5:52 PM:
-----------------------------------------------------------------

The original intent of 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339]'s changes were 
to retry the `listOffsets(...)` if a retriable exception were thrown, as other 
methods within the AdminClient automatically handle retries. In hindsight, I 
should have sought clarification on that change, since 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484]
 that added `listOffsets(...)` was ambiguous about retries while 
[KIP-117|https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations]
 that added `AdminClient` included automatic retry support.

Having said that, we need to decide whether to:
1. Revert the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffsets(...)` does not retry. IMO this would leave the `AdminClient` in a 
strange state where some methods retry and others don't, with no documentation 
about which methods do and do not retry. We would also have to change the 
Connect code that uses this to perform the retries, though that's doable.
2. Keep the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffset(...)` that does retry on retriable exceptions, but throws 
`UnknownTopicOrPartitionException` when the topic does not exist (after 
successive retries) rather than the timeout exception.
3. Keep as-is and simply better document the behavior, perhaps by making an 
addendum to KIP-396.

WDYT, [~mimaison], [~cmccabe], and others?


was (Author: rhauch):
The original intent of 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339]'s changes were 
to retry the `listOffsets(...)` if a retriable exception were thrown, as other 
methods within the AdminClient automatically handle retries. In hindsight, I 
should have sought clarification on that change, since 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484]
 that added `listOffsets(...)` was ambiguous about retries while 
[KIP-117|https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations]
 that added `AdminClient` included automatic retry support.

Having said that, we need to decide whether to:
1. Revert the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffsets(...)` does not retry. IMO this would leave the `AdminClient` in a 
strange state where some methods retry and others don't, with no documentation 
about which methods do and do not retry. We would also have to change the 
Connect code that uses this to perform the retries, though that's doable.
2. Keep the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffset(...)` that does retry on retriable exceptions, but throws 
`UnknownTopicOrPartitionException` when the topic does not exist (after 
successive retries) rather than the timeout exception.
3. Keep as-is and simply better document the behavior.

I suspect option 3 is not really acceptable.

WDYT, [~mimaison], [~cmccabe], and others?

> Compatibility break in Admin.listOffsets()
> ------------------------------------------
>
>                 Key: KAFKA-12879
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12879
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin
>    Affects Versions: 2.8.0, 2.7.1, 2.6.2
>            Reporter: Tom Bentley
>            Assignee: Kirk True
>            Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to