Re: Metadata requests for subset of partitions

Stanislav Kozlovski Fri, 28 Feb 2025 05:16:05 -0800

Thanks for the concrete data.

In essence, 199,000 partition metadata entries (MetadataResponsePartition) are 
unnecessarily sent over the network in this example.


Looking at the response object[0], I count about 50 bytes per entry. That's a 
total of 9.95MB of extra information going over the wire, around 50KB per 
consumer.

In the happy path, the consumer fetches this data on every metadata refresh - 
that is to say every 5 minutes. On leadership changes and rebalances this also 
gets refreshed, which can happen more often in a large cluster.

In any case, 50KB extra sent over the wire doesn't sound significant for a 
protocol that regualrly moves many megabytes a second.

In principle I agree it can be optimized. In practice I am wondering whether 
it'd be worth it to save on what just appears to be 0.16KB/s of superfluous 
information here. As mentioned by Kirk, there are downsides to doing this too. 
(mainly bug risk imo)

That's why my initial question was what motivated you to look toward this 
optimization. Any information on impact/overhead you're seeing would be useful!

[0] - 
https://github.com/apache/kafka/blob/8b605bd3620268214a85c8a520cad22dec815358/clients/src/main/resources/common/message/MetadataResponse.json#L77-L90

Best,
Stan

On 2025/02/28 12:49:44 Michał Łowicki wrote:
> On Fri, Feb 28, 2025 at 10:10 AM Stanislav Kozlovski <
> stanislavkozlov...@apache.org> wrote:
> 
> > > > It's certainly been a topic that's come up before. In certain
> > situations
> > > > the current approach is a bit heavy-handed. The current approach for
> > > > fetching metadata has a number of benefits: it keeps the protocol from
> > > > being too chatty, which reduces load on the brokers and makes
> > maintaining a
> > > > consistent via of the metadata on the client much easier. There's a
> > fairly
> > > > substantial overhead with fetching metadata and batching it in a single
> > > > request eliminates a lot of edge cases.
> >
> > My understanding is that the substantial overhead of the metadata request
> > comes precisely from the total number of partitions the broker needs to
> > iterate over and build objects for. (please correct me if I'm wrong and
> > it's something non-obvious)
> >
> > If that's true, then the less partitions it has to do that for - the less
> > overhead there would be?
> >
> > As for the edge cases, I am not aware of them but can certainly imagine
> > something like the old consumer protocol where the client chooses
> > assignment be prone to edge cases from incomplete metadata. Perhaps the
> > subset partition metadata fetching can be employed strategically in cases
> > where that risk is lower.
> >
> > --
> >
> > Michal, out of curiosity, what lead you to this question? Do you see any
> > substantial overhead in the metadata path on the clients/brokers because of
> > this unnecessary fetching?
> >
> > --
> >
> > re: chattiness - do we all define chattiness by the number of requests per
> > second?
> > Michal, you mention fetching the subset could reduce chattiness but I
> > don't see how that could happen. By definition if you send less data per
> > response, then the chances are you'll need more to send more requests once
> > you want more data. Am I missing anything?
> >
> 
> amount of data transferred.
> 
> We've an in-house client and frequently for topics with hundreds or
> thousands of partitions, the consumption is spread across a significant
> number of consumers where each one is interested in a few partitions.
> 
> 1000 partitions, 200 consumers where each gets 5 partitions.
> 
> Currently each one on start needs to fetch metadata for all topics so we
> retrieve 1000 * 200 partitions metadata (1000 requests) from the brokers
> where 1000 would be enough.
> 
> 
> >
> > On 2025/02/28 07:56:29 Michał Łowicki wrote:
> > > On Thu, Feb 27, 2025 at 5:39 PM Kirk True <k...@kirktrue.pro> wrote:
> > >
> > > > Hi Michał,
> > > >
> > > > On Thu, Feb 27, 2025, at 3:44 AM, Michał Łowicki wrote:
> > > > > Hi there!
> > > > >
> > > > > Is there any reason why Metadata requests
> > > > > <https://kafka.apache.org/protocol.html#The_Messages_Metadata> do
> > not
> > > > > support fetching metadata for subsets of the partitions? If a certain
> > > > > client is interested only in e.g. 1 but topic may have many so most
> > of
> > > > > fetched data isn't really used.
> > > > >
> > > >
> > > > It's certainly been a topic that's come up before. In certain
> > situations
> > > > the current approach is a bit heavy-handed. The current approach for
> > > > fetching metadata has a number of benefits: it keeps the protocol from
> > > > being too chatty, which reduces load on the brokers and makes
> > maintaining a
> > > > consistent via of the metadata on the client much easier. There's a
> > fairly
> > > > substantial overhead with fetching metadata and batching it in a single
> > > > request eliminates a lot of edge cases.
> > > >
> > >
> > > Sure, I'm rather thinking about an opt-in option to the protocol where,
> > if
> > > specified, metadata response would contain metadata for a specified set
> > of
> > > partitions (otherwise as of today metadata for all of them). To cover the
> > > cases where consumers need to know metadata for only a small portion of
> > > partitions. Then it would be less for the broker to handle such requests
> > > and craft responses and protocol would be actually less chatty in those
> > > cases.
> > >
> > >
> > > >
> > > > As always, further discussion and suggestions for improvements in this
> > > > area are welcomed :)
> > > >
> > > > Thanks,
> > > > Kirk
> > >
> >
>

Re: Metadata requests for subset of partitions

Reply via email to