[jira] [Commented] (KAFKA-12879) Compatibility break in Admin.listOffsets()

Colin McCabe (Jira) Tue, 01 Feb 2022 13:29:06 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485496#comment-17485496
 ]


Colin McCabe commented on KAFKA-12879:
--------------------------------------

Let me give a little context here on the behavior.

The Producer and Consumer typically retry most operations if the partition in 
question doesn't exist. The thinking there is that if the user specified they 
want to consume from topic foo-0, they knew what they were doing, and we should 
just wait for foo-0 to appear. This is particularly useful because Kafka has 
eventually consistent metadata -- even after creating a topic, it may take a 
few seconds for every broker to become aware of the new topic.

For the AdminClient, we usually don't retry if a topic doesn't exist. For 
example, if you try to delete a topic, we don't loop forever if the topic 
doesn't exist -- we just return UNKNOWN_TOPIC_OR_PARTITION immediately. You 
could view this as inconsistent, but being consistent with Producer / Consumer 
here would result in a somewhat useless API. People do not want their topic 
deletes to take a long time and then fail with TimeoutException if the topic 
doesn't exist.

I would argue that listOffsets is more similar to the second case here. It's 
very rare that you would be invoking listOffsets on a partition that had just 
been created. Looping forever if the partition doesn't exist isn't really a 
useful behavior in most scenarios. It seems like Connect has a use case for 
this -- since Connect knows for sure that the topic exists (or will exist), it 
should do the retries itself, rather than pushing this into AdminClient.

So I would argue we should just revert the change.

Also, as to the "without documentation" part -- we do make an effort to 
document the exceptions admin methods can throw. We're missing a lot of them 
(PRs would be very welcome here!) For example, listPartitionReassignments 
documents that it can return UnknownTopicOrPartitionException, 
ClusterAuthorizationException, TimeoutException, etc. If we revert the change, 
we should also add this kind of documentation to the listOffsets function.

> Compatibility break in Admin.listOffsets()
> ------------------------------------------
>
>                 Key: KAFKA-12879
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12879
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin
>    Affects Versions: 2.8.0, 2.7.1, 2.6.2
>            Reporter: Tom Bentley
>            Assignee: Kirk True
>            Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (KAFKA-12879) Compatibility break in Admin.listOffsets()

Reply via email to