On Fri, Feb 28, 2025 at 2:14 PM Stanislav Kozlovski < stanislavkozlov...@apache.org> wrote:
> Thanks for the concrete data. > > In essence, 199,000 partition metadata entries (MetadataResponsePartition) > are unnecessarily sent over the network in this example. > > Looking at the response object[0], I count about 50 bytes per entry. > That's a total of 9.95MB of extra information going over the wire, around > 50KB per consumer. > > In the happy path, the consumer fetches this data on every metadata > refresh - that is to say every 5 minutes. On leadership changes and > rebalances this also gets refreshed, which can happen more often in a large > cluster. > > In any case, 50KB extra sent over the wire doesn't sound significant for a > protocol that regualrly moves many megabytes a second. > > In principle I agree it can be optimized. In practice I am wondering > whether it'd be worth it to save on what just appears to be 0.16KB/s of > superfluous information here. As mentioned by Kirk, there are downsides to > doing this too. (mainly bug risk imo) > > That's why my initial question was what motivated you to look toward this > optimization. Any information on impact/overhead you're seeing would be > useful! > Yea, during the normal course of action that overhead is nothing compared to how much actual data is produced or consumed. The issue arises in our case when there are a lot of re-assignments at once, each consumer during assignments update currently fetches fresh metadata. Then we have a spike of requests where fetching metadata for a subset of partitions could reduce the amount of work needed for the broker (as during the spikes handling metadata requests was visible in kafka brokers' profiles). During re-assignments we can think on our side to rely on cached data but that isn't always possible (in scenarios where the consumer wasn't previously managing anything from a topic then cache will be empty) and asking for a subset of partitions at the protocol looked like a nice addition to the protocol. > > [0] - > https://github.com/apache/kafka/blob/8b605bd3620268214a85c8a520cad22dec815358/clients/src/main/resources/common/message/MetadataResponse.json#L77-L90 > > Best, > Stan > > On 2025/02/28 12:49:44 Michał Łowicki wrote: > > On Fri, Feb 28, 2025 at 10:10 AM Stanislav Kozlovski < > > stanislavkozlov...@apache.org> wrote: > > > > > > > It's certainly been a topic that's come up before. In certain > > > situations > > > > > the current approach is a bit heavy-handed. The current approach > for > > > > > fetching metadata has a number of benefits: it keeps the protocol > from > > > > > being too chatty, which reduces load on the brokers and makes > > > maintaining a > > > > > consistent via of the metadata on the client much easier. There's a > > > fairly > > > > > substantial overhead with fetching metadata and batching it in a > single > > > > > request eliminates a lot of edge cases. > > > > > > My understanding is that the substantial overhead of the metadata > request > > > comes precisely from the total number of partitions the broker needs to > > > iterate over and build objects for. (please correct me if I'm wrong and > > > it's something non-obvious) > > > > > > If that's true, then the less partitions it has to do that for - the > less > > > overhead there would be? > > > > > > As for the edge cases, I am not aware of them but can certainly imagine > > > something like the old consumer protocol where the client chooses > > > assignment be prone to edge cases from incomplete metadata. Perhaps the > > > subset partition metadata fetching can be employed strategically in > cases > > > where that risk is lower. > > > > > > -- > > > > > > Michal, out of curiosity, what lead you to this question? Do you see > any > > > substantial overhead in the metadata path on the clients/brokers > because of > > > this unnecessary fetching? > > > > > > -- > > > > > > re: chattiness - do we all define chattiness by the number of requests > per > > > second? > > > Michal, you mention fetching the subset could reduce chattiness but I > > > don't see how that could happen. By definition if you send less data > per > > > response, then the chances are you'll need more to send more requests > once > > > you want more data. Am I missing anything? > > > > > > > amount of data transferred. > > > > We've an in-house client and frequently for topics with hundreds or > > thousands of partitions, the consumption is spread across a significant > > number of consumers where each one is interested in a few partitions. > > > > 1000 partitions, 200 consumers where each gets 5 partitions. > > > > Currently each one on start needs to fetch metadata for all topics so we > > retrieve 1000 * 200 partitions metadata (1000 requests) from the brokers > > where 1000 would be enough. > > > > > > > > > > On 2025/02/28 07:56:29 Michał Łowicki wrote: > > > > On Thu, Feb 27, 2025 at 5:39 PM Kirk True <k...@kirktrue.pro> wrote: > > > > > > > > > Hi Michał, > > > > > > > > > > On Thu, Feb 27, 2025, at 3:44 AM, Michał Łowicki wrote: > > > > > > Hi there! > > > > > > > > > > > > Is there any reason why Metadata requests > > > > > > <https://kafka.apache.org/protocol.html#The_Messages_Metadata> > do > > > not > > > > > > support fetching metadata for subsets of the partitions? If a > certain > > > > > > client is interested only in e.g. 1 but topic may have many so > most > > > of > > > > > > fetched data isn't really used. > > > > > > > > > > > > > > > > It's certainly been a topic that's come up before. In certain > > > situations > > > > > the current approach is a bit heavy-handed. The current approach > for > > > > > fetching metadata has a number of benefits: it keeps the protocol > from > > > > > being too chatty, which reduces load on the brokers and makes > > > maintaining a > > > > > consistent via of the metadata on the client much easier. There's a > > > fairly > > > > > substantial overhead with fetching metadata and batching it in a > single > > > > > request eliminates a lot of edge cases. > > > > > > > > > > > > > Sure, I'm rather thinking about an opt-in option to the protocol > where, > > > if > > > > specified, metadata response would contain metadata for a specified > set > > > of > > > > partitions (otherwise as of today metadata for all of them). To > cover the > > > > cases where consumers need to know metadata for only a small portion > of > > > > partitions. Then it would be less for the broker to handle such > requests > > > > and craft responses and protocol would be actually less chatty in > those > > > > cases. > > > > > > > > > > > > > > > > > > As always, further discussion and suggestions for improvements in > this > > > > > area are welcomed :) > > > > > > > > > > Thanks, > > > > > Kirk > > > > > > > > > >