On Fri, Feb 28, 2025 at 10:10 AM Stanislav Kozlovski < stanislavkozlov...@apache.org> wrote:
> > > It's certainly been a topic that's come up before. In certain > situations > > > the current approach is a bit heavy-handed. The current approach for > > > fetching metadata has a number of benefits: it keeps the protocol from > > > being too chatty, which reduces load on the brokers and makes > maintaining a > > > consistent via of the metadata on the client much easier. There's a > fairly > > > substantial overhead with fetching metadata and batching it in a single > > > request eliminates a lot of edge cases. > > My understanding is that the substantial overhead of the metadata request > comes precisely from the total number of partitions the broker needs to > iterate over and build objects for. (please correct me if I'm wrong and > it's something non-obvious) > > If that's true, then the less partitions it has to do that for - the less > overhead there would be? > > As for the edge cases, I am not aware of them but can certainly imagine > something like the old consumer protocol where the client chooses > assignment be prone to edge cases from incomplete metadata. Perhaps the > subset partition metadata fetching can be employed strategically in cases > where that risk is lower. > > -- > > Michal, out of curiosity, what lead you to this question? Do you see any > substantial overhead in the metadata path on the clients/brokers because of > this unnecessary fetching? > > -- > > re: chattiness - do we all define chattiness by the number of requests per > second? > Michal, you mention fetching the subset could reduce chattiness but I > don't see how that could happen. By definition if you send less data per > response, then the chances are you'll need more to send more requests once > you want more data. Am I missing anything? > amount of data transferred. We've an in-house client and frequently for topics with hundreds or thousands of partitions, the consumption is spread across a significant number of consumers where each one is interested in a few partitions. 1000 partitions, 200 consumers where each gets 5 partitions. Currently each one on start needs to fetch metadata for all topics so we retrieve 1000 * 200 partitions metadata (1000 requests) from the brokers where 1000 would be enough. > > On 2025/02/28 07:56:29 Michał Łowicki wrote: > > On Thu, Feb 27, 2025 at 5:39 PM Kirk True <k...@kirktrue.pro> wrote: > > > > > Hi Michał, > > > > > > On Thu, Feb 27, 2025, at 3:44 AM, Michał Łowicki wrote: > > > > Hi there! > > > > > > > > Is there any reason why Metadata requests > > > > <https://kafka.apache.org/protocol.html#The_Messages_Metadata> do > not > > > > support fetching metadata for subsets of the partitions? If a certain > > > > client is interested only in e.g. 1 but topic may have many so most > of > > > > fetched data isn't really used. > > > > > > > > > > It's certainly been a topic that's come up before. In certain > situations > > > the current approach is a bit heavy-handed. The current approach for > > > fetching metadata has a number of benefits: it keeps the protocol from > > > being too chatty, which reduces load on the brokers and makes > maintaining a > > > consistent via of the metadata on the client much easier. There's a > fairly > > > substantial overhead with fetching metadata and batching it in a single > > > request eliminates a lot of edge cases. > > > > > > > Sure, I'm rather thinking about an opt-in option to the protocol where, > if > > specified, metadata response would contain metadata for a specified set > of > > partitions (otherwise as of today metadata for all of them). To cover the > > cases where consumers need to know metadata for only a small portion of > > partitions. Then it would be less for the broker to handle such requests > > and craft responses and protocol would be actually less chatty in those > > cases. > > > > > > > > > > As always, further discussion and suggestions for improvements in this > > > area are welcomed :) > > > > > > Thanks, > > > Kirk > > >