On Fri, Feb 28, 2025 at 2:14 PM Stanislav Kozlovski <
stanislavkozlov...@apache.org> wrote:

> Thanks for the concrete data.
>
> In essence, 199,000 partition metadata entries (MetadataResponsePartition)
> are unnecessarily sent over the network in this example.
>
> Looking at the response object[0], I count about 50 bytes per entry.
> That's a total of 9.95MB of extra information going over the wire, around
> 50KB per consumer.
>
> In the happy path, the consumer fetches this data on every metadata
> refresh - that is to say every 5 minutes. On leadership changes and
> rebalances this also gets refreshed, which can happen more often in a large
> cluster.
>
> In any case, 50KB extra sent over the wire doesn't sound significant for a
> protocol that regualrly moves many megabytes a second.
>
> In principle I agree it can be optimized. In practice I am wondering
> whether it'd be worth it to save on what just appears to be 0.16KB/s of
> superfluous information here. As mentioned by Kirk, there are downsides to
> doing this too. (mainly bug risk imo)
>
> That's why my initial question was what motivated you to look toward this
> optimization. Any information on impact/overhead you're seeing would be
> useful!
>

Yea, during the normal course of action that overhead is nothing compared
to how much actual data is produced or consumed. The issue arises in our
case when there are a lot of re-assignments at once, each consumer during
assignments update currently fetches fresh metadata. Then we have a spike
of requests where fetching metadata for a subset of partitions could reduce
the amount of work needed for the broker (as during the spikes handling
metadata requests was visible in kafka brokers' profiles).

During re-assignments we can think on our side to rely on cached data but
that isn't always possible (in scenarios where the consumer wasn't
previously managing anything from a topic then cache will be empty) and
asking for a subset of partitions at the protocol looked like a nice
addition to the protocol.


>
> [0] -
> https://github.com/apache/kafka/blob/8b605bd3620268214a85c8a520cad22dec815358/clients/src/main/resources/common/message/MetadataResponse.json#L77-L90
>
> Best,
> Stan
>
> On 2025/02/28 12:49:44 Michał Łowicki wrote:
> > On Fri, Feb 28, 2025 at 10:10 AM Stanislav Kozlovski <
> > stanislavkozlov...@apache.org> wrote:
> >
> > > > > It's certainly been a topic that's come up before. In certain
> > > situations
> > > > > the current approach is a bit heavy-handed. The current approach
> for
> > > > > fetching metadata has a number of benefits: it keeps the protocol
> from
> > > > > being too chatty, which reduces load on the brokers and makes
> > > maintaining a
> > > > > consistent via of the metadata on the client much easier. There's a
> > > fairly
> > > > > substantial overhead with fetching metadata and batching it in a
> single
> > > > > request eliminates a lot of edge cases.
> > >
> > > My understanding is that the substantial overhead of the metadata
> request
> > > comes precisely from the total number of partitions the broker needs to
> > > iterate over and build objects for. (please correct me if I'm wrong and
> > > it's something non-obvious)
> > >
> > > If that's true, then the less partitions it has to do that for - the
> less
> > > overhead there would be?
> > >
> > > As for the edge cases, I am not aware of them but can certainly imagine
> > > something like the old consumer protocol where the client chooses
> > > assignment be prone to edge cases from incomplete metadata. Perhaps the
> > > subset partition metadata fetching can be employed strategically in
> cases
> > > where that risk is lower.
> > >
> > > --
> > >
> > > Michal, out of curiosity, what lead you to this question? Do you see
> any
> > > substantial overhead in the metadata path on the clients/brokers
> because of
> > > this unnecessary fetching?
> > >
> > > --
> > >
> > > re: chattiness - do we all define chattiness by the number of requests
> per
> > > second?
> > > Michal, you mention fetching the subset could reduce chattiness but I
> > > don't see how that could happen. By definition if you send less data
> per
> > > response, then the chances are you'll need more to send more requests
> once
> > > you want more data. Am I missing anything?
> > >
> >
> > amount of data transferred.
> >
> > We've an in-house client and frequently for topics with hundreds or
> > thousands of partitions, the consumption is spread across a significant
> > number of consumers where each one is interested in a few partitions.
> >
> > 1000 partitions, 200 consumers where each gets 5 partitions.
> >
> > Currently each one on start needs to fetch metadata for all topics so we
> > retrieve 1000 * 200 partitions metadata (1000 requests) from the brokers
> > where 1000 would be enough.
> >
> >
> > >
> > > On 2025/02/28 07:56:29 Michał Łowicki wrote:
> > > > On Thu, Feb 27, 2025 at 5:39 PM Kirk True <k...@kirktrue.pro> wrote:
> > > >
> > > > > Hi Michał,
> > > > >
> > > > > On Thu, Feb 27, 2025, at 3:44 AM, Michał Łowicki wrote:
> > > > > > Hi there!
> > > > > >
> > > > > > Is there any reason why Metadata requests
> > > > > > <https://kafka.apache.org/protocol.html#The_Messages_Metadata>
> do
> > > not
> > > > > > support fetching metadata for subsets of the partitions? If a
> certain
> > > > > > client is interested only in e.g. 1 but topic may have many so
> most
> > > of
> > > > > > fetched data isn't really used.
> > > > > >
> > > > >
> > > > > It's certainly been a topic that's come up before. In certain
> > > situations
> > > > > the current approach is a bit heavy-handed. The current approach
> for
> > > > > fetching metadata has a number of benefits: it keeps the protocol
> from
> > > > > being too chatty, which reduces load on the brokers and makes
> > > maintaining a
> > > > > consistent via of the metadata on the client much easier. There's a
> > > fairly
> > > > > substantial overhead with fetching metadata and batching it in a
> single
> > > > > request eliminates a lot of edge cases.
> > > > >
> > > >
> > > > Sure, I'm rather thinking about an opt-in option to the protocol
> where,
> > > if
> > > > specified, metadata response would contain metadata for a specified
> set
> > > of
> > > > partitions (otherwise as of today metadata for all of them). To
> cover the
> > > > cases where consumers need to know metadata for only a small portion
> of
> > > > partitions. Then it would be less for the broker to handle such
> requests
> > > > and craft responses and protocol would be actually less chatty in
> those
> > > > cases.
> > > >
> > > >
> > > > >
> > > > > As always, further discussion and suggestions for improvements in
> this
> > > > > area are welcomed :)
> > > > >
> > > > > Thanks,
> > > > > Kirk
> > > >
> > >
> >
>

Reply via email to