[
https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang updated KAFKA-12370:
----------------------------------
Labels: needs-kip new-streams-runtime-should-fix (was: needs-kip)
> Refactor KafkaStreams exposed metadata hierarchy
> ------------------------------------------------
>
> Key: KAFKA-12370
> URL: https://issues.apache.org/jira/browse/KAFKA-12370
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Assignee: Josep Prat
> Priority: Major
> Labels: needs-kip, new-streams-runtime-should-fix
>
> Currently in KafkaStreams we have two groups of metadata getter:
> 1.
> {code}
> allMetadata
> allMetadataForStore
> {code}
> Return collection of {{StreamsMetadata}}, which only contains the partitions
> as active/standby, plus the hostInfo, but not exposing any task info.
> 2.
> {code}
> queryMetadataForKey
> {code}
> Returns {{KeyQueryMetadata}} that includes the hostInfos of active and
> standbys, plus the partition id.
> 3.
> {code}
> localThreadsMetadata
> {code}
> Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}}
> for active and standby tasks.
> All the above functions are used for interactive queries, but their exposed
> metadata are very different, and some use cases would need to have all
> client, thread, and task metadata to fulfill the feature development. At the
> same time, we may have a more dynamic "task -> thread" mapping in the future
> and also the embedded clients like consumers would not be per thread, but per
> client.
> ---------------
> Rethinking about the metadata, I feel we can have a more consistent hierarchy
> as the following:
> * {{StreamsMetadata}} represent the metadata for the client, which includes
> the set of {{ThreadMetadata}} for its existing thread and the set of
> {{TaskMetadata}} for active and standby tasks assigned to this client, plus
> client metadata including hostInfo, embedded client ids.
> * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for
> currently assigned tasks.
> * {{TaskMetadata}} includes the name (including the sub-topology id and the
> partition id), the state, the corresponding sub-topology description
> (including the state store names, source topic names).
> * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed
> from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and
> {{localMetadata}} (renamed from localThreadMetadata) returns a single
> {{StreamsMetadata}}.
> * {{KeyQueryMetadata}} Class would be deprecated and replaced by
> {{TaskMetadata}}.
> To illustrate as an example, to find out who are the current active host /
> standby hosts of a specific store, we would call {{allMetadataForStore}}, and
> for each returned {{StreamsMetadata}} we loop over their contained
> {{TaskMetadata}} for active / standby, and filter by its corresponding
> sub-topology's description's contained store name.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)