[
https://issues.apache.org/jira/browse/KAFKA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justine Olshan updated KAFKA-13102:
-----------------------------------
Description:
Currently, the fetch path for replicas relies on the topic IDs in the metadata
cache. However, the propagation of topic ID information is done through the
UpdateMetadata request and is too slow. At first the topic will have no ID in
the metadata cache and we will send an older request and then we get the ID and
have to close the session. This will likely happen on broker startup and with
new topics. This has resulted in increased partitions in error, frequent
closing of sessions and made tests like
ConsumerBounceTest#testCloseDuringRebalance extremely flaky.
A quick test with topic IDs stored in the replica manager during the handling
of LISR requests showed that significantly fewer errors and made
ConsumerBounceTest#testCloseDuringRebalance much less flaky (passing 50/50 runs
vs. 11/50 runs).
The task now is figuring out the best strategy to store topic IDs for the fetch
path using the IDs from the LISR request.
was:
Currently, the fetch path for replicas relies on the topic IDs in the metadata
cache. However, the propagation of topic ID information is done through the
UpdateMetadata request and is too slow. This has resulted in increased
partitions in error, frequent closing of sessions and made tests like
ConsumerBounceTest#testCloseDuringRebalance extremely flaky.
A quick test with topic IDs stored in the replica manager during the handling
of LISR requests showed that significantly fewer errors and made
ConsumerBounceTest#testCloseDuringRebalance much less flaky (passing 50/50 runs
vs. 11/50 runs).
The task now is figuring out the best strategy to store topic IDs for the fetch
path using the IDs from the LISR request.
> Topic IDs not propagated to metadata cache quickly enough for Fetch path
> ------------------------------------------------------------------------
>
> Key: KAFKA-13102
> URL: https://issues.apache.org/jira/browse/KAFKA-13102
> Project: Kafka
> Issue Type: Bug
> Reporter: Justine Olshan
> Assignee: Justine Olshan
> Priority: Major
>
> Currently, the fetch path for replicas relies on the topic IDs in the
> metadata cache. However, the propagation of topic ID information is done
> through the UpdateMetadata request and is too slow. At first the topic will
> have no ID in the metadata cache and we will send an older request and then
> we get the ID and have to close the session. This will likely happen on
> broker startup and with new topics. This has resulted in increased partitions
> in error, frequent closing of sessions and made tests like
> ConsumerBounceTest#testCloseDuringRebalance extremely flaky.
> A quick test with topic IDs stored in the replica manager during the handling
> of LISR requests showed that significantly fewer errors and made
> ConsumerBounceTest#testCloseDuringRebalance much less flaky (passing 50/50
> runs vs. 11/50 runs).
> The task now is figuring out the best strategy to store topic IDs for the
> fetch path using the IDs from the LISR request.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)