[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817412#comment-17817412 ]
Omnia Ibrahim commented on KAFKA-16212: --------------------------------------- The moment the cache in `ReplicaManager.allPartitions` is represented as `Pool[TopicPartition, HostedPartition]` which is a wrapper around `ConcurrentHashMap[opicPartition, HostedPartition]` update this to use `TopicIdPartition` as a key turn out to be tricky as # not all APIs that interact with `ReplicaManager` in order to fetch/update partition cache are aware of topicId like consumer coordinator, handling some requests in KafkaApi where the request schema doesn't have topicId, etc . # TopicId is represented as Optional in many places which means we might endup populate it with null or dummy uuid multiple times to construct TopicIdPartition. I have 3 proposals at the moment: * *Proposal #1 :* Update TopicIdPartitions to have constructor with topicId as optional. And change `ReplicaManager.allPartitions` to be `LinkedBlockingQueue[TopicIdPartition, HostedPartition]`. _*This might be the simplest one as far as I can see.*_ ** any API that is not topic id aware will just get the last entry that match topicIdPartition.topicPartition. ** The code will need to make sure that we don't have duplicates by `TopicIdPartition` in the `LinkedBlockingQueue`. ** We will need to revert having topic Id as optional in TopicIdPartitions once everywhere in Kafka is topic-id aware. * *Proposal #2 :* change `ReplicaManager.allPartitions` to `new Pool[TopicPartition, LinkedBlockingQueue[(Option[Uuid], HostedPartition)]]` where `Option[Uuid]` represent topic id. This make the cache scheme bit complex. The proposal will ** consider the last entry in `LinkedBlockingQueue` is the current value. ** The code will make sure that `LinkedBlockingQueue` has only entry for the same topic id ** Topic Id aware APIs that need to fetch/update the partition will be updated to use `TopicPartition` and topic Id ** Topic Id non-aware APIs will remain using topic partitions and the replicaManager will assume that these APIs referring to the last entry in `LinkedBlockingQueue` * *Proposal#3:* The other option is to keep two separate caches one `Pool[TopicIdPartition, HostedPartition]` for partitions and another one `Pool[TopicPartition, Uuid]` for the last assigned topic id for each partition in order to form `TopicIdPartition`. This is the least favourite as having 2 caches will risk that one of them can go out of data at any time. [~jolshan] Do you have any strong preferences? I am leaning toward 1st as it is less messy than the others. WDYT? > Cache partitions by TopicIdPartition instead of TopicPartition > -------------------------------------------------------------- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement > Affects Versions: 3.7.0 > Reporter: Gaurav Narula > Assignee: Omnia Ibrahim > Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)