Re: [DISCUSS] KIP-714: Client metrics and observability

Colin McCabe Wed, 21 Jul 2021 18:17:42 -0700

On Tue, Jun 29, 2021, at 07:22, Magnus Edenhill wrote:
> Den tors 17 juni 2021 kl 00:52 skrev Colin McCabe <[email protected]>:
> > A few critiques:
> >
> > - As I wrote above, I think this could benefit a lot by being split into
> > several RPCs. A registration RPC, a report RPC, and an unregister RPC seem
> > like logical choices.
> >
> 
> Responded to this in your previous mail, but in short I think a single
> request is sufficient and keeps the implementation complexity / state down.
>


Hi Magnus,

I still suspect that trying to do everything with a single RPC is more complex 
than using multiple RPCs.

Can you go into more detail about how the client learns what metrics it should 
send? This was the purpose of the "registration" step in my scheme above.

It seems quite awkward to combine an RPC for reporting metrics with and RPC for 
finding out what metrics are configured to be reported. For example, how would 
you build a tool to check what metrics are configured to be reported? Does the 
tool have to report fake metrics, just because there's no other way to get back 
that information? Seems wrong. (It would be a bit like combining createTopics 
and listTopics for "simplicity")

> > - I don't think the client should be able to choose its own UUID. This
> > adds complexity and introduces a chance that clients will choose an ID that
> > is not unique. We already have an ID that the client itself supplies
> > (clientID) so there is no need to introduce another such ID.
> >
> 
> The CLIENT_INSTANCE_ID (which is a combination of the client.id and a UUID)
> is actually generated by the receiving broker on first contact.
> The need for a new unique semi-random id is outlined in the KIP, but in
> short; the client.id is not unique, and we need something unique that still
> is prefix-matchable to the client.id so that we can add subscriptions
> either using prefix-matching of just the client.id (which may match one or
> more client instances), and exact matching which will match a one specific
> client instance.

Hmm... the client id is already sent in every RPC as part of the header. It's 
not necessary to send it again as part of one of the other RPC fields, right?

More generally, why does the client instance ID need to be prefix-matchable? 
That seems like an implementation detail of the metrics collection system used 
on the broker side. Maybe someone wants to group by things other than client 
IDs -- perhaps client versions, for instance. By the same argument, we should 
put the client version string in the client instance ID, since someone might 
want to group by that. Or maybe we should include the hostname, and the IP, 
and, and, and.... You see the issue here. I think we shouldn't get involved in 
this kind of decision -- if we just pass a UUID, the broker-side software can 
group it or prefix it however it wants internally.

> > - In general the schema seems to have a bad case of string-itis. UUID,
> > content type, and requested metrics are all strings. Since these messages
> > will be sent very frequently, it's quite costly to use strings for all
> > these things. We have a type for UUID, which uses 16 bytes -- let's use
> > that type for client instance ID, rather than a string which will be much
> > larger. Also, since we already send clientID in the message header, there
> > is no need to include it again in the instance ID.
> >
> 
> As explained above we need the client.id in the CLIENT_INSTANCE_ID. And I
> don't think the overhead of this one string per request is going to be much
> of an issue,
> typical metric push intervals are probably in the >60s range.
> If this becomes a problem we could use a per-connection identifier that the
> broker translates to the client instance id before pushing metrics upwards
> in the system.
> 

This is actually an interesting design question -- why not use a 
per-TCP-connection identifier, rather than a per-client-instance identifier? If 
we are grouping by other things anyway (clientID, principal, etc.) on the 
server side, do we need to maintain a per-process identifier rather than a 
per-connection one?

> 
> > - I think it would also be nice to have an enum or something for
> > AcceptedContentTypes, RequestedMetrics, etc. We know that new additions to
> > these categories will require KIPs, so it should be straightforward for the
> > project to just have an enum that allows us to communicate these as ints.
> >
> 
> I'm thinking this might be overly constraining. The broker doesn't parse or
> handle the received metrics data itself but just pushes it to the metrics
> plugin, using an enum would require a KIP and broker upgrade if the metrics 
> plugin
> supports a newer version of OTLP.
> It is probably better if we don't strictly control the metric format itself.
> 

Unfortunately, we have to strictly control the metrics format, because 
otherwise clients can't implement it. I agree that we don't need to specify how 
the broker-side code works, since that is pluggable. It's also reasonable for 
the clients to have pluggable extensions as well, but this KIP won't be of much 
use if we don't at least define a basic set of metrics that most clients can 
understand how to send. The open source clients will not implement anything 
more than what is specified in the KIP (or at least the AK one won't...)

> 
> 
> > - Can you talk about whether you are adding any new library dependencies
> > to the Kafka client? It seems like you'd want to add opencensus /
> > opentelemetry, if we are using that format here.
> >
> 
> Yeah, as we get closer to concensus more implementation specific details
> will be added to the KIP.
> 

I'm not sure if OpenCensus adds any value to this KIP, to be honest. Their 
primary focus was never on the format of the data being sent (in fact, the last 
time they checked, they left the format up to each OpenCensus implementation). 
That may have changed, but I think it still has limited usefulness to us, since 
we have our own format which we have to use anyway.

> 
> >
> > - Standard client resource labels: can we send these only in the
> > registration RPC?
> >
> 
> These labels are part of the serialized OTLP data, which means it would
> need to be unpacked and repacked (including compression) by the broker (or
> metrics plugin), which I believe is more costly than sending them for each 
> request.
> 

Hmm, that data is about 10 fields, most of which are strings. It certainly adds 
a lot of overhead to resend it each time.

I don't follow the comment about unpacking and repacking -- since the client 
registered with the broker it already knows all this information, so there's 
nothing to unpack or repack, except from memory. If it's more convenient to 
serialize it once rather than multiple times, that is an implementation detail 
of the broker side plugin, which we are not specifying here anyway.

best,
Colin

> Thanks,
> Magnus
> 
> >
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to