Hi Colin,

It was just analogy to say api version is similar to subscription Id. Every
request come with api version information, broker can return an error if
supported api version has been changed. It’s similar to the role of
subscriptionid here.

Thanks,
Feng



On Mon, Sep 20, 2021 at 9:51 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Mon, Sep 20, 2021, at 12:30, Feng Min wrote:
> > Some comments about subscriptionId.
> >
> > ApiVersion is not a good example. API Version here is actually acting
> like
> > an identifier as the client will carry this information. Forcing to
> > disconnect a connection from the server side is quite heavy. IMHO, the
> > behavior is kind of part of the protocol. Adding subscriptionId is
> > relatively simple and straightforward.
> >
>
> Hi Feng,
>
> Sorry, I'm not sure what you mean by "API Version here is actually acting
> like an identifier." APIVersions is not an identifier. Each client gets the
> same ApiVersionsResponse from the broker. In most clusters, each broker
> will return the same set of ApiVersionsResponse as well. So you can not use
> ApiVersionsResponse as an identifier of anything, as far as I can see.
>
> Dropping a connection is not that "heavy" considering that it only has to
> happen when we change the metrics subscription, which should be a very rare
> event, if I understand the proposal correctly.
>
> best,
> Colin
>
>
> >
> >
> >> Hmm, SubscriptionId seems rather complex. We don't have this kind of
> >> complicated machinery for changing ApiVersions, and that is something
> that
> >> can also change over time, and which affects the clients.
> >>
> >> Changing the configured metrics should be extremely rare. In this case,
> >> why don't we just close all connections on the broker side? Then the
> >> clients can re-connect and re-fetch the information about the metrics
> >> they're supposed to send.
> >>
> >> >
> >> > Something like this:
> >> >
> >> > // Get the configured metrics subscription.
> >> > GetTelemetrySubscriptionsRequest {
> >> >    StrNull  ClientInstanceId  // Null on first invocation to retrieve
> a
> >> > newly generated instance id from the broker.
> >> > }
> >>
> >> It seems like the goal here is to have the client register itself, so
> that
> >> we can tell if this is an old client reconnecting. If that is the case,
> >> then I suggest to rename the RPC to RegisterClient.
> >>
> >> I think we need a better name than "clientInstanceId" since that name is
> >> very similar to "clientId." Perhaps something like originId? Or
> clientUuid?
> >> Let's also use UUID here rather than a string.
> >>
> >> > 6. > PushTelemetryRequest{
> >> >        ClientInstanceId = f00d-feed-deff-ceff-ffff-….,
> >> >        SubscriptionId = 0x234adf34,
> >> >        ContentType = OTLPv8|ZSTD,
> >> >        Terminating = False,
> >> >        Metrics = …// zstd-compressed OTLPv08-protobuf-serialized
> metrics
> >> >   }
> >>
> >> It's not necessary for the client to re-send its client instance ID
> here,
> >> since it already registered with RegisterClient. If the TCP connection
> >> dropped, it will have to re-send RegisterClient anyway. SubscriptionID
> we
> >> should get rid of, as I said above.
> >>
> >> I don't see the need for protobufs. Why not just use Kafka's own
> >> serialization mechanism? As much as possible, we should try to avoid
> >> creating "turduckens" of protocol X containing a buffer serialized with
> >> protocol Y, containing a protocol serialized with protocol Z. These
> aren't
> >> conducive to a good implementation, and make it harder for people to
> write
> >> clients. Just use Kafka's RPC protocol (with optional fields if you
> wish).
> >>
> >> If we do compression on Kafka RPC, I would prefer that we do it a more
> >> generic way that applies to all control messages, not just this one. I
> also
> >> doubt we need to support lots and lots of different compression codecs,
> at
> >> first at least.
> >>
> >> Another thing I'd like to understand is whether we truly need
> >> "terminating" (or anything like it). I'm still confused about how the
> >> backend could use this. Keep in mind that we may receive it on multiple
> >> brokers (or not receive it at all). We may receive more stuff about
> client
> >> XYZ from broker 1 after we have already received a "terminated" for
> client
> >> XYZ from broker 2.
> >>
> >> > If the broker connection goes down or the connection is to be used for
> >> > other purposes (e.g., blocking FetchRequests), the client will send
> >> > PushTelemetryRequests to any other broker in the cluster, using the
> same
> >> > ClientInstanceId and SubscriptionId as received in the latest
> >> > GetTelemetrySubscriptionsResponse.
> >> >
> >> > While the subscriptionId may change during the lifetime of the client
> >> > instance (when metric subscriptions are updated), the
> ClientInstanceId is
> >> > only acquired once and must not change (as it is used to identify the
> >> > unique client instance).
> >> > ...
> >> > What we do want though is ability to single out a specific client
> >> instance
> >> > to give it a more fine-grained subscription for troubleshooting, and
> >> > we can do that with the current proposal with matching solely on the
> >> > CLIENT_INSTANCE_ID.
> >> > In other words; all clients will have the same standard metrics
> >> > subscription, but specific client instances can have alternate
> >> > subscriptions.
> >>
> >> That makes sense, and gives a good reason why we might want to couple
> >> finding the metrics info to passing the client UUID.
> >>
> >> > The metrics collector/tsdb/whatever will need to identify a single
> client
> >> > instance, regardless of which broker received the metrics.
> >> > The chapter on CLIENT_INSTANCE_ID motivates why we need a unique
> >> > identifier, basically because neither clientID, principal or remote
> >> > address:port, etc, can be
> >> > used to identify a single client instance.
> >> >
> >>
> >> Thanks for the background. I agree that a UUID is useful here.
> >>
> >> This also gives additional reasons why using a UUID rather than a
> >> free-form string is desirable (many databases can handle numbers more
> >> efficiently than strings).
> >>
> >> > > Unfortunately, we have to strictly control the metrics format,
> because
> >> > > otherwise clients can't implement it. I agree that we don't need to
> >> specify
> >> > > how the broker-side code works, since that is pluggable. It's also
> >> > > reasonable for the clients to have pluggable extensions as well, but
> >> this
> >> > > KIP won't be of much use if we don't at least define a basic set of
> >> metrics
> >> > > that most clients can understand how to send. The open source
> clients
> >> will
> >> > > not implement anything more than what is specified in the KIP (or at
> >> least
> >> > > the AK one won't...)
> >> > >
> >> >
> >> > Makes sense, in the updated proposal above I changed ContentType to a
> >> > bitmask.
> >> >
> >>
> >> Hmm. You might have forgotten to save your changes. It's still listed
> as a
> >> string in the KIP.
> >>
> >> > > I'm not sure if OpenCensus adds any value to this KIP, to be honest.
> >> Their
> >> > > primary focus was never on the format of the data being sent (in
> fact,
> >> the
> >> > > last time they checked, they left the format up to each OpenCensus
> >> > > implementation). That may have changed, but I think it still has
> >> limited
> >> > > usefulness to us, since we have our own format which we have to use
> >> anyway.
> >> > >
> >> >
> >> > Oh, I meant concensus as in kafka-dev agreement :)
> >> >
> >> > Feng is looking into the implementation details of the Java client and
> >> will
> >> > update the KIP with regards to dependencies.
> >>
> >> I still don't understand what value using the OpenTelemetry format is
> >> giving us here, as opposed to using Kafka's own format. I guess we will
> >> need to talk more about this.
> >>
> >> > > Hmm, that data is about 10 fields, most of which are strings. It
> >> certainly
> >> > > adds a lot of overhead to resend it each time.
> >> > >
> >> > > I don't follow the comment about unpacking and repacking -- since
> the
> >> > > client registered with the broker it already knows all this
> >> information, so
> >> > > there's nothing to unpack or repack, except from memory. If it's
> more
> >> > > convenient to serialize it once rather than multiple times, that is
> an
> >> > > implementation detail of the broker side plugin, which we are not
> >> > > specifying here anyway.
> >> > >
> >> >
> >> > The current proposal is pretty much stateless on the broker, it does
> not
> >> > need to hold any state for a client (instance), and no state
> >> > synchronization is needed
> >> > between brokers in the cluster, which allows a client to seamlessly
> send
> >> > metrics to any broker it wants and keeps the API overhead down (no
> need
> >> to
> >> > re-register when
> >> > switching brokers for instance).
> >> >
> >> > We could remove the labels that are already available to the broker
> on a
> >> > per-request basis or that it already maintains state for:
> >> >  - client_id
> >> >  - client_instance_id
> >> >  - client_software_*
> >> >
> >> > Leaving the following to still be included:
> >> >  - group_id
> >> >  - group_instance_id
> >> >  - transactional_id
> >> >   etc..
> >> >
> >> > What do you think of that?
> >>
> >> Yes, that might be a reasonable balance. I don't think we currently
> >> associate group_id with a connection, so it would be reasonable to
> re-send
> >> it in the telemetry request.
> >>
> >> What I would suggest next is to set up a protocol definition for the
> >> metrics you are sending over the wire. For example, you could specify
> >> something like this for producer metrics:
> >>
> >> > { "name": "clientProducerRecordBytes", "type": "int64",
> >> > "versions": "0+", "taggedVersions": "0+", "tag": 0,
> >> >  "about": "client.producer.record.bytes" },
> >> > { "name": "clientProducerQueueMaxMessages", "type": "int32",
> >> > "versions": "0+", "taggedVersions": "0+", "tag": 1,
> >> > "about": "client.producer.queue.max.messages" },
> >>
> >> This specifies what it is (int32, int64, etc.), how it's being sent,
> etc.
> >>
> >> best,
> >> Colin
> >>
> >> >
> >> >
> >> > Thanks,
> >> > Magnus
> >> >
> >> >
> >> >
> >> > >
> >> > > best,
> >> > > Colin
> >> > >
> >> > > > Thanks,
> >> > > > Magnus
> >> > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> > --
> > Best,
> > Feng
>
-- 
Best,
Feng

Reply via email to