Re: [DISCUSS] KIP-714: Client metrics and observability

Feng Min Mon, 20 Sep 2021 12:31:25 -0700

Some comments about subscriptionId.

On Mon, Sep 20, 2021 at 11:41 AM Colin McCabe <cmcc...@apache.org> wrote:


> On Tue, Sep 14, 2021, at 00:47, Magnus Edenhill wrote:
> > Thanks for your feedback Colin, see my updated proposal below.
> > ...
>
> Hi Magnus,
>
> Thanks for the update.
>
> >
> > Splitting up the API into separate data and control requests makes sense.
> > With a split we would have one API for querying the broker for configured
> > metrics subscriptions,
> > and one API for pushing the collected metrics to the broker.
> >
> > A mechanism is still needed to notify the client when the subscription is
> > changed;
> > I’ve added a SubscriptionId for this purpose (which could be a checksum
> of
> > the configured metrics subscription), this id is sent to the client along
> > with the metrics subscription, and the client sends it back to the broker
> > when pushing metrics. If the broker finds the pushed subscription id to
> > differ from what is expected it will return an error to the client, which
> > triggers the client to retrieve the new subscribed metrics and an updated
> > subscription id. The generation of the subscriptionId is opaque to the
> > client.
> >
>
>
ApiVersion is not a good example. API Version here is actually acting like
an identifier as the client will carry this information. Forcing to
disconnect a connection from the server side is quite heavy. IMHO, the
behavior is kind of part of the protocol. Adding subscriptionId is
relatively simple and straightforward.



> Hmm, SubscriptionId seems rather complex. We don't have this kind of
> complicated machinery for changing ApiVersions, and that is something that
> can also change over time, and which affects the clients.
>
> Changing the configured metrics should be extremely rare. In this case,
> why don't we just close all connections on the broker side? Then the
> clients can re-connect and re-fetch the information about the metrics
> they're supposed to send.
>
> >
> > Something like this:
> >
> > // Get the configured metrics subscription.
> > GetTelemetrySubscriptionsRequest {
> >    StrNull  ClientInstanceId  // Null on first invocation to retrieve a
> > newly generated instance id from the broker.
> > }
>
> It seems like the goal here is to have the client register itself, so that
> we can tell if this is an old client reconnecting. If that is the case,
> then I suggest to rename the RPC to RegisterClient.
>
> I think we need a better name than "clientInstanceId" since that name is
> very similar to "clientId." Perhaps something like originId? Or clientUuid?
> Let's also use UUID here rather than a string.
>
> > 6. > PushTelemetryRequest{
> >        ClientInstanceId = f00d-feed-deff-ceff-ffff-….,
> >        SubscriptionId = 0x234adf34,
> >        ContentType = OTLPv8|ZSTD,
> >        Terminating = False,
> >        Metrics = …// zstd-compressed OTLPv08-protobuf-serialized metrics
> >   }
>
> It's not necessary for the client to re-send its client instance ID here,
> since it already registered with RegisterClient. If the TCP connection
> dropped, it will have to re-send RegisterClient anyway. SubscriptionID we
> should get rid of, as I said above.
>
> I don't see the need for protobufs. Why not just use Kafka's own
> serialization mechanism? As much as possible, we should try to avoid
> creating "turduckens" of protocol X containing a buffer serialized with
> protocol Y, containing a protocol serialized with protocol Z. These aren't
> conducive to a good implementation, and make it harder for people to write
> clients. Just use Kafka's RPC protocol (with optional fields if you wish).
>
> If we do compression on Kafka RPC, I would prefer that we do it a more
> generic way that applies to all control messages, not just this one. I also
> doubt we need to support lots and lots of different compression codecs, at
> first at least.
>
> Another thing I'd like to understand is whether we truly need
> "terminating" (or anything like it). I'm still confused about how the
> backend could use this. Keep in mind that we may receive it on multiple
> brokers (or not receive it at all). We may receive more stuff about client
> XYZ from broker 1 after we have already received a "terminated" for client
> XYZ from broker 2.
>
> > If the broker connection goes down or the connection is to be used for
> > other purposes (e.g., blocking FetchRequests), the client will send
> > PushTelemetryRequests to any other broker in the cluster, using the same
> > ClientInstanceId and SubscriptionId as received in the latest
> > GetTelemetrySubscriptionsResponse.
> >
> > While the subscriptionId may change during the lifetime of the client
> > instance (when metric subscriptions are updated), the ClientInstanceId is
> > only acquired once and must not change (as it is used to identify the
> > unique client instance).
> > ...
> > What we do want though is ability to single out a specific client
> instance
> > to give it a more fine-grained subscription for troubleshooting, and
> > we can do that with the current proposal with matching solely on the
> > CLIENT_INSTANCE_ID.
> > In other words; all clients will have the same standard metrics
> > subscription, but specific client instances can have alternate
> > subscriptions.
>
> That makes sense, and gives a good reason why we might want to couple
> finding the metrics info to passing the client UUID.
>
> > The metrics collector/tsdb/whatever will need to identify a single client
> > instance, regardless of which broker received the metrics.
> > The chapter on CLIENT_INSTANCE_ID motivates why we need a unique
> > identifier, basically because neither clientID, principal or remote
> > address:port, etc, can be
> > used to identify a single client instance.
> >
>
> Thanks for the background. I agree that a UUID is useful here.
>
> This also gives additional reasons why using a UUID rather than a
> free-form string is desirable (many databases can handle numbers more
> efficiently than strings).
>
> > > Unfortunately, we have to strictly control the metrics format, because
> > > otherwise clients can't implement it. I agree that we don't need to
> specify
> > > how the broker-side code works, since that is pluggable. It's also
> > > reasonable for the clients to have pluggable extensions as well, but
> this
> > > KIP won't be of much use if we don't at least define a basic set of
> metrics
> > > that most clients can understand how to send. The open source clients
> will
> > > not implement anything more than what is specified in the KIP (or at
> least
> > > the AK one won't...)
> > >
> >
> > Makes sense, in the updated proposal above I changed ContentType to a
> > bitmask.
> >
>
> Hmm. You might have forgotten to save your changes. It's still listed as a
> string in the KIP.
>
> > > I'm not sure if OpenCensus adds any value to this KIP, to be honest.
> Their
> > > primary focus was never on the format of the data being sent (in fact,
> the
> > > last time they checked, they left the format up to each OpenCensus
> > > implementation). That may have changed, but I think it still has
> limited
> > > usefulness to us, since we have our own format which we have to use
> anyway.
> > >
> >
> > Oh, I meant concensus as in kafka-dev agreement :)
> >
> > Feng is looking into the implementation details of the Java client and
> will
> > update the KIP with regards to dependencies.
>
> I still don't understand what value using the OpenTelemetry format is
> giving us here, as opposed to using Kafka's own format. I guess we will
> need to talk more about this.
>
> > > Hmm, that data is about 10 fields, most of which are strings. It
> certainly
> > > adds a lot of overhead to resend it each time.
> > >
> > > I don't follow the comment about unpacking and repacking -- since the
> > > client registered with the broker it already knows all this
> information, so
> > > there's nothing to unpack or repack, except from memory. If it's more
> > > convenient to serialize it once rather than multiple times, that is an
> > > implementation detail of the broker side plugin, which we are not
> > > specifying here anyway.
> > >
> >
> > The current proposal is pretty much stateless on the broker, it does not
> > need to hold any state for a client (instance), and no state
> > synchronization is needed
> > between brokers in the cluster, which allows a client to seamlessly send
> > metrics to any broker it wants and keeps the API overhead down (no need
> to
> > re-register when
> > switching brokers for instance).
> >
> > We could remove the labels that are already available to the broker on a
> > per-request basis or that it already maintains state for:
> >  - client_id
> >  - client_instance_id
> >  - client_software_*
> >
> > Leaving the following to still be included:
> >  - group_id
> >  - group_instance_id
> >  - transactional_id
> >   etc..
> >
> > What do you think of that?
>
> Yes, that might be a reasonable balance. I don't think we currently
> associate group_id with a connection, so it would be reasonable to re-send
> it in the telemetry request.
>
> What I would suggest next is to set up a protocol definition for the
> metrics you are sending over the wire. For example, you could specify
> something like this for producer metrics:
>
> > { "name": "clientProducerRecordBytes", "type": "int64",
> > "versions": "0+", "taggedVersions": "0+", "tag": 0,
> >  "about": "client.producer.record.bytes" },
> > { "name": "clientProducerQueueMaxMessages", "type": "int32",
> > "versions": "0+", "taggedVersions": "0+", "tag": 1,
> > "about": "client.producer.queue.max.messages" },
>
> This specifies what it is (int32, int64, etc.), how it's being sent, etc.
>
> best,
> Colin
>
> >
> >
> > Thanks,
> > Magnus
> >
> >
> >
> > >
> > > best,
> > > Colin
> > >
> > > > Thanks,
> > > > Magnus
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Best,
Feng

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to