Re: [DISCUSS] KIP-714: Client metrics and observability

Kirk True Mon, 13 Dec 2021 11:06:38 -0800

Hi Jun,

On Thu, Dec 9, 2021, at 2:28 PM, Jun Rao wrote:
> 13. Using OpenTelemetry. Does that require runtime dependency
> on OpenTelemetry library? How good is the compatibility story
> of OpenTelemetry? This is important since an application could have other
> OpenTelemetry dependencies than the Kafka client.


The current design is that the OpenTelemetry JARs would ship with the client. 
Perhaps we can design the client such that the JARs aren't even loaded if the 
user has opted out. The user could even exclude the JARs from their 
dependencies if they so wished.

I can't speak to the compatibility of the libraries. Is it possible that we 
include a shaded version?

Thanks,
Kirk

> 
> 14. The proposal listed idempotence=true. This is more of a configuration
> than a metric. Are we including that as a metric? What other configurations
> are we including? Should we separate the configurations from the metrics?
> 
> Thanks,
> 
> Jun
> 
> On Mon, Nov 29, 2021 at 7:34 AM Magnus Edenhill <[email protected]> wrote:
> 
> > Hey Bob,
> >
> > That's a good point.
> >
> > Request type labels were considered but since they're already tracked by
> > broker-side metrics
> > they were left out as to avoid metric duplication, however those metrics
> > are not per connection,
> > so they won't be that useful in practice for troubleshooting specific
> > client instances.
> >
> > I'll add the request_type label to the relevant metrics.
> >
> > Thanks,
> > Magnus
> >
> >
> > Den tis 23 nov. 2021 kl 19:20 skrev Bob Barrett
> > <[email protected]>:
> >
> > > Hi Magnus,
> > >
> > > Thanks for the thorough KIP, this seems very useful.
> > >
> > > Would it make sense to include the request type as a label for the
> > > `client.request.success`, `client.request.errors` and
> > `client.request.rtt`
> > > metrics? I think it would be very useful to see which specific requests
> > are
> > > succeeding and failing for a client. One specific case I can think of
> > where
> > > this could be useful is producer batch timeouts. If a Java application
> > does
> > > not enable producer client logs (unfortunately, in my experience this
> > > happens more often than it should), the application logs will only
> > contain
> > > the expiration error message, but no information about what is causing
> > the
> > > timeout. The requests might all be succeeding but taking too long to
> > > process batches, or metadata requests might be failing, or some or all
> > > produce requests might be failing (if the bootstrap servers are reachable
> > > from the client but one or more other brokers are not, for example). If
> > the
> > > cluster operator is able to identify the specific requests that are slow
> > or
> > > failing for a client, they will be better able to diagnose the issue
> > > causing batch timeouts.
> > >
> > > One drawback I can think of is that this will increase the cardinality of
> > > the request metrics. But any given client is only going to use a small
> > > subset of the request types, and since we already have partition labels
> > for
> > > the topic-level metrics, I think request labels will still make up a
> > > relatively small percentage of the set of metrics.
> > >
> > > Thanks,
> > > Bob
> > >
> > > On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hi Magnus,
> > > >
> > > > I think this is a very useful addition. We also have a similar (but
> > much
> > > > more simplistic) implementation of this. Maybe I missed it in the KIP
> > but
> > > > what about adding metrics about the subscription cache itself? That I
> > > think
> > > > would improve its usability and debuggability as we'd be able to see
> > its
> > > > performance, hit/miss rates, eviction counts and others.
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > > > On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi Mickael,
> > > > >
> > > > > see inline.
> > > > >
> > > > > Den ons 10 nov. 2021 kl 15:21 skrev Mickael Maison <
> > > > > [email protected]
> > > > > >:
> > > > >
> > > > > > Hi Magnus,
> > > > > >
> > > > > > I see you've addressed some of the points I raised above but some
> > (4,
> > > > > > 5) have not been addressed yet.
> > > > > >
> > > > >
> > > > > Re 4) How will the user/app know metrics are being sent.
> > > > >
> > > > > One possibility is to add a JMX metric (thus for user consumption)
> > for
> > > > the
> > > > > number of metric pushes the
> > > > > client has performed, or perhaps the number of metrics subscriptions
> > > > > currently being collected.
> > > > > Would that be sufficient?
> > > > >
> > > > > Re 5) Metric sizes and rates
> > > > >
> > > > > A worst case scenario for a producer that is producing to 50 unique
> > > > topics
> > > > > and emitting all standard metrics yields
> > > > > a serialized size of around 100KB prior to compression, which
> > > compresses
> > > > > down to about 20-30% of that depending
> > > > > on compression type and topic name uniqueness.
> > > > > The numbers for a consumer would be similar.
> > > > >
> > > > > In practice the number of unique topics would be far less, and the
> > > > > subscription set would typically be for a subset of metrics.
> > > > > So we're probably closer to 1kb, or less, compressed size per client
> > > per
> > > > > push interval.
> > > > >
> > > > > As both the subscription set and push intervals are controlled by the
> > > > > cluster operator it shouldn't be too hard
> > > > > to strike a good balance between metrics overhead and granularity.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > I'm really uneasy with this being enabled by default on the client
> > > > > > side. When collecting data, I think the best practice is to ensure
> > > > > > users are explicitly enabling it.
> > > > > >
> > > > >
> > > > > Requiring metrics to be explicitly enabled on clients severely
> > cripples
> > > > its
> > > > > usability and value.
> > > > >
> > > > > One of the problems that this KIP aims to solve is for useful metrics
> > > to
> > > > be
> > > > > available on demand
> > > > > regardless of the technical expertise of the user. As Ryanne points,
> > > out
> > > > a
> > > > > savvy user/organization
> > > > > will typically have metrics collection and monitoring in place
> > already,
> > > > and
> > > > > the benefits of this KIP
> > > > > are then more of a common set and format metrics across client
> > > > > implementations and languages.
> > > > > But that is not the typical Kafka user in my experience, they're not
> > > > Kafka
> > > > > experts and they don't have the
> > > > > knowledge of how to best instrument their clients.
> > > > > Having metrics enabled by default for this user base allows the Kafka
> > > > > operators to proactively and reactively
> > > > > monitor and troubleshoot client issues, without the need for the less
> > > > savvy
> > > > > user to do anything.
> > > > > It is often too late to tell a user to enable metrics when the
> > problem
> > > > has
> > > > > already occurred.
> > > > >
> > > > > Now, to be clear, even though metrics are enabled by default on
> > clients
> > > > it
> > > > > is not enabled by default
> > > > > on the brokers; the Kafka operator needs to build and set up a
> > metrics
> > > > > plugin and add metrics subscriptions
> > > > > before anything is sent from the client.
> > > > > It is opt-out on the clients and opt-in on the broker.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > You mentioned brokers already have
> > > > > > some(most?) of the information contained in metrics, if so then why
> > > > > > are we collecting it again? Surely there must be some new
> > information
> > > > > > in the client metrics.
> > > > > >
> > > > >
> > > > > From the user's perspective the Kafka infrastructure extends from
> > > > > producer.send() to
> > > > > messages being returned from consumer.poll(), a giant black box where
> > > > > there's a lot going on between those
> > > > > two points. The brokers currently only see what happens once those
> > > > requests
> > > > > and messages hits the broker,
> > > > > but as Kafka clients are complex pieces of machinery there's a myriad
> > > of
> > > > > queues, timers, and state
> > > > > that's critical to the operation and infrastructure that's not
> > > currently
> > > > > visible to the operator.
> > > > > Relying on the user to accurately and timely provide this missing
> > > > > information is not generally feasible.
> > > > >
> > > > >
> > > > > Most of the standard metrics listed in the KIP are data points that
> > the
> > > > > broker does not have.
> > > > > Only a small number of metrics are duplicates (like the request
> > counts
> > > > and
> > > > > sizes), but they are included
> > > > > to ease correlation when inspecting these client metrics.
> > > > >
> > > > >
> > > > >
> > > > > > Moreover this is a brand new feature so it's even harder to justify
> > > > > > enabling it and forcing onto all our users. If disabled by default,
> > > > > > it's relatively easy to enable in a new release if we decide to,
> > but
> > > > > > once enabled by default it's much harder to disable. Also this
> > > feature
> > > > > > will apply to all future metrics we will add.
> > > > > >
> > > > >
> > > > > I think maturity of a feature implementation should be the deciding
> > > > factor,
> > > > > rather than
> > > > > the design of it (which this KIP is). I.e., if the implementation is
> > > not
> > > > > deemed mature enough
> > > > > for release X.Y it will be disabled.
> > > > >
> > > > >
> > > > >
> > > > > > Overall I think it's an interesting feature but I'd prefer to be
> > > > > > slightly defensive and see how it works in practice before enabling
> > > it
> > > > > > everywhere.
> > > > > >
> > > > >
> > > > > Right, and I agree on being defensive, but since this feature still
> > > > > requires manual
> > > > > enabling on the brokers before actually being used, I think that
> > gives
> > > > > enough control
> > > > > to opt-in or out of this feature as needed.
> > > > >
> > > > > Thanks for your comments!
> > > > >
> > > > > Regards,
> > > > > Magnus
> > > > >
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Mickael
> > > > > >
> > > > > > On Mon, Nov 8, 2021 at 7:55 AM Magnus Edenhill <[email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > Thanks David for pointing this out,
> > > > > > > I've updated the KIP to include client_id as a matching selector.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Magnus
> > > > > > >
> > > > > > > Den tors 4 nov. 2021 kl 18:01 skrev David Mao
> > > > > <[email protected]
> > > > > > >:
> > > > > > >
> > > > > > > > Hey Magnus,
> > > > > > > >
> > > > > > > > I noticed that the KIP outlines the initial selectors supported
> > > as:
> > > > > > > >
> > > > > > > >    - client_instance_id - CLIENT_INSTANCE_ID UUID string
> > > > > > representation.
> > > > > > > >    - client_software_name  - client software implementation
> > name.
> > > > > > > >    - client_software_version  - client software implementation
> > > > > version.
> > > > > > > >
> > > > > > > > In the given reactive monitoring workflow, we mention that the
> > > > > > application
> > > > > > > > user does not know their client's client instance ID, but it's
> > > > > outlined
> > > > > > > > that the operator can add a metrics subscription selecting for
> > > > > > clientId. I
> > > > > > > > don't see clientId as one of the supported selectors.
> > > > > > > > I can see how this would have made sense in a previous
> > iteration
> > > > > given
> > > > > > that
> > > > > > > > the previous client instance ID proposal was to construct the
> > > > client
> > > > > > > > instance ID using clientId as a prefix. Now that the client
> > > > instance
> > > > > > ID is
> > > > > > > > a UUID, would we want to add clientId as a supported selector?
> > > > > > > > Let me know what you think.
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > > > On Tue, Oct 19, 2021 at 12:33 PM Magnus Edenhill <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Mickael!
> > > > > > > > >
> > > > > > > > > Den tis 19 okt. 2021 kl 19:30 skrev Mickael Maison <
> > > > > > > > > [email protected]
> > > > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi Magnus,
> > > > > > > > > >
> > > > > > > > > > Thanks for the proposal.
> > > > > > > > > >
> > > > > > > > > > 1. Looking at the protocol section, isn't
> > "ClientInstanceId"
> > > > > > expected
> > > > > > > > > > to be a field in GetTelemetrySubscriptionsResponseV0?
> > > > Otherwise,
> > > > > > how
> > > > > > > > > > does a client retrieve this value?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Good catch, it got removed by mistake in one of the edits.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. In the client API section, you mention a new method
> > > > > > > > > > "clientInstanceId()". Can you clarify which interfaces are
> > > > > > affected?
> > > > > > > > > > Is it only Consumer and Producer?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > And Admin. Will update the KIP.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 3. I'm a bit concerned this is enabled by default. Even if
> > > the
> > > > > data
> > > > > > > > > > collected is supposed to be not sensitive, I think this can
> > > be
> > > > > > > > > > problematic in some environments. Also users don't seem to
> > > have
> > > > > the
> > > > > > > > > > choice to only expose some metrics. Knowing how much data
> > > > transit
> > > > > > > > > > through some applications can be considered critical.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The broker already knows how much data transits through the
> > > > client
> > > > > > > > though,
> > > > > > > > > right?
> > > > > > > > > Care has been taken not to expose information in the standard
> > > > > metrics
> > > > > > > > that
> > > > > > > > > might
> > > > > > > > > reveal sensitive information.
> > > > > > > > >
> > > > > > > > > Do you have an example of how the proposed metrics could leak
> > > > > > sensitive
> > > > > > > > > information?
> > > > > > > > > As for limiting the what metrics to export; I guess that
> > could
> > > > make
> > > > > > sense
> > > > > > > > > in some
> > > > > > > > > very sensitive use-cases, but those users might disable
> > metrics
> > > > > > > > altogether
> > > > > > > > > for now.
> > > > > > > > > Could these concerns be addressed by a later KIP?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 4. As a user, how do you know if your application is
> > actively
> > > > > > sending
> > > > > > > > > > metrics? Are there new metrics exposing what's going on,
> > like
> > > > how
> > > > > > much
> > > > > > > > > > data is being sent?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > That's a good question.
> > > > > > > > > Since the proposed metrics interface is not aimed at, or
> > > directly
> > > > > > > > available
> > > > > > > > > to, the application
> > > > > > > > > I guess there's little point of adding it here, but instead
> > > > adding
> > > > > > > > > something to the
> > > > > > > > > existing JMX metrics?
> > > > > > > > > Do you have any suggestions?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 5. If all metrics are enabled on a regular Consumer or
> > > > Producer,
> > > > > do
> > > > > > > > > > you have an idea how much throughput this would use?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It depends on the number of partition/topics/etc the client
> > is
> > > > > > producing
> > > > > > > > > to/consuming from.
> > > > > > > > > I'll add some sizes to the KIP for some typical use-cases.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Magnus
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Tue, Oct 19, 2021 at 5:06 PM Magnus Edenhill <
> > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Den tis 19 okt. 2021 kl 13:22 skrev Tom Bentley <
> > > > > > [email protected]
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Magnus,
> > > > > > > > > > > >
> > > > > > > > > > > > I reviewed the KIP since you called the vote (sorry for
> > > not
> > > > > > > > reviewing
> > > > > > > > > > when
> > > > > > > > > > > > you announced your intention to call the vote). I have
> > a
> > > > few
> > > > > > > > > questions
> > > > > > > > > > on
> > > > > > > > > > > > some of the details.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. There's no Javadoc on ClientTelemetryPayload.data(),
> > > so
> > > > I
> > > > > > don't
> > > > > > > > > know
> > > > > > > > > > > > whether the payload is exposed through this method as
> > > > > > compressed or
> > > > > > > > > > not.
> > > > > > > > > > > > Later on you say "Decompression of the payloads will be
> > > > > > handled by
> > > > > > > > > the
> > > > > > > > > > > > broker metrics plugin, the broker should expose a
> > > suitable
> > > > > > > > > > decompression
> > > > > > > > > > > > API to the metrics plugin for this purpose.", which
> > > > suggests
> > > > > > it's
> > > > > > > > the
> > > > > > > > > > > > compressed data in the buffer, but then we don't know
> > > which
> > > > > > codec
> > > > > > > > was
> > > > > > > > > > used,
> > > > > > > > > > > > nor the API via which the plugin should decompress it
> > if
> > > > > > required
> > > > > > > > for
> > > > > > > > > > > > forwarding to the ultimate metrics store. Should the
> > > > > > > > > > ClientTelemetryPayload
> > > > > > > > > > > > expose a method to get the compression and a
> > > decompressor?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Good point, updated.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 2. The client-side API is expressed as StringOrError
> > > > > > > > > > > > ClientInstance::ClientInstanceId(int timeout_ms). I
> > > > > understand
> > > > > > that
> > > > > > > > > > you're
> > > > > > > > > > > > thinking about the librdkafka implementation, but it
> > > would
> > > > be
> > > > > > good
> > > > > > > > to
> > > > > > > > > > show
> > > > > > > > > > > > the API as it would appear on the Apache Kafka clients.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This was meant as pseudo-code, but I changed it to Java.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 3. "PushTelemetryRequest|Response - protocol request
> > used
> > > > by
> > > > > > the
> > > > > > > > > > client to
> > > > > > > > > > > > send metrics to any broker it is connected to." To be
> > > > clear,
> > > > > > this
> > > > > > > > > means
> > > > > > > > > > > > that the client can choose any of the connected brokers
> > > and
> > > > > > push to
> > > > > > > > > > just
> > > > > > > > > > > > one of them? What should a supporting client do if it
> > > gets
> > > > an
> > > > > > error
> > > > > > > > > > when
> > > > > > > > > > > > pushing metrics to a broker, retry sending to the same
> > > > broker
> > > > > > or
> > > > > > > > try
> > > > > > > > > > > > pushing to another broker, or drop the metrics? Should
> > > > > > supporting
> > > > > > > > > > clients
> > > > > > > > > > > > send successive requests to a single broker, or round
> > > > robin,
> > > > > > or is
> > > > > > > > > > that up
> > > > > > > > > > > > to the client author? I'm guessing the behaviour should
> > > be
> > > > > > sticky
> > > > > > > > to
> > > > > > > > > > > > support the rate limiting features, but I think it
> > would
> > > be
> > > > > > good
> > > > > > > > for
> > > > > > > > > > client
> > > > > > > > > > > > authors if this section were explicit on the
> > recommended
> > > > > > behaviour.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > You are right, I've updated the KIP to make this clearer.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 4. "Mapping the client instance id to an actual
> > > application
> > > > > > > > instance
> > > > > > > > > > > > running on a (virtual) machine can be done by
> > inspecting
> > > > the
> > > > > > > > metrics
> > > > > > > > > > > > resource labels, such as the client source address and
> > > > source
> > > > > > port,
> > > > > > > > > or
> > > > > > > > > > > > security principal, all of which are added by the
> > > receiving
> > > > > > broker.
> > > > > > > > > > This
> > > > > > > > > > > > will allow the operator together with the user to
> > > identify
> > > > > the
> > > > > > > > actual
> > > > > > > > > > > > application instance." Is this really always true? The
> > > > source
> > > > > > IP
> > > > > > > > and
> > > > > > > > > > port
> > > > > > > > > > > > might be a loadbalancer/proxy in some setups. The
> > > > principal,
> > > > > as
> > > > > > > > > already
> > > > > > > > > > > > mentioned in the KIP, might be shared between multiple
> > > > > > > > applications.
> > > > > > > > > > So at
> > > > > > > > > > > > worst the organization running the clients might have
> > to
> > > > > > consult
> > > > > > > > the
> > > > > > > > > > logs
> > > > > > > > > > > > of a set of client applications, right?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, that's correct. There's no guaranteed mapping from
> > > > > > > > > > client_instance_id
> > > > > > > > > > > to
> > > > > > > > > > > an actual instance, that's why the KIP recommends client
> > > > > > > > > implementations
> > > > > > > > > > to
> > > > > > > > > > > log the client instance id
> > > > > > > > > > > upon retrieval, and also provide an API for the
> > application
> > > > to
> > > > > > > > retrieve
> > > > > > > > > > the
> > > > > > > > > > > instance id programmatically
> > > > > > > > > > > if it has a better way of exposing it.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 5. "Tests indicate that a compression ratio up to 10x is
> > > > > > possible for
> > > > > > > > > the
> > > > > > > > > > > > standard metrics." Client authors might appreciate your
> > > > > > mentioning
> > > > > > > > > > which
> > > > > > > > > > > > compression codec got these results.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Good point. Updated.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 6. "Should the client send a push request prior to
> > expiry
> > > > of
> > > > > > the
> > > > > > > > > > previously
> > > > > > > > > > > > calculated PushIntervalMs the broker will discard the
> > > > metrics
> > > > > > and
> > > > > > > > > > return a
> > > > > > > > > > > > PushTelemetryResponse with the ErrorCode set to
> > > > RateLimited."
> > > > > > Is
> > > > > > > > this
> > > > > > > > > > > > RATE_LIMITED a new error code? It's not mentioned in
> > the
> > > > "New
> > > > > > Error
> > > > > > > > > > Codes"
> > > > > > > > > > > > section.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > That's a leftover, it should be using the standard
> > > > ThrottleTime
> > > > > > > > > > mechanism.
> > > > > > > > > > > Fixed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 7. In the section "Standard client resource labels"
> > > > > > application_id
> > > > > > > > is
> > > > > > > > > > > > described as Kafka Streams only, but the section of
> > > "Client
> > > > > > > > > > Identification"
> > > > > > > > > > > > talks about "application instance id as an optional
> > > future
> > > > > > > > > nice-to-have
> > > > > > > > > > > > that may be included as a metrics label if it has been
> > > set
> > > > by
> > > > > > the
> > > > > > > > > > user", so
> > > > > > > > > > > > I'm confused whether non-Kafka Streams clients should
> > set
> > > > an
> > > > > > > > > > application_id
> > > > > > > > > > > > or not.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'll clarify this in the KIP, but basically we would need
> > > to
> > > > > add
> > > > > > an `
> > > > > > > > > > > application.id` config
> > > > > > > > > > > property for non-streams clients for this purpose, and
> > > that's
> > > > > > outside
> > > > > > > > > the
> > > > > > > > > > > scope of this KIP since we want to make it zero-conf:ish
> > on
> > > > the
> > > > > > > > client
> > > > > > > > > > side.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Kind regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Tom
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the review,
> > > > > > > > > > > Magnus
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Oct 7, 2021 at 5:26 PM Magnus Edenhill <
> > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the KIP following our recent discussions
> > > on
> > > > > the
> > > > > > > > > mailing
> > > > > > > > > > > > list:
> > > > > > > > > > > > >  - split the protocol in two, one for getting the
> > > metrics
> > > > > > > > > > subscriptions,
> > > > > > > > > > > > > and one for pushing the metrics.
> > > > > > > > > > > > >  - simplifications: initially only one supported
> > > metrics
> > > > > > format,
> > > > > > > > no
> > > > > > > > > > > > > client.id in the instance id, etc.
> > > > > > > > > > > > >  - made CLIENT_METRICS subscription configuration
> > > entries
> > > > > > more
> > > > > > > > > > structured
> > > > > > > > > > > > >    and allowing better client matching selectors (not
> > > > only
> > > > > > on the
> > > > > > > > > > > > instance
> > > > > > > > > > > > > id, but also the other
> > > > > > > > > > > > >    client resource labels, such as
> > > client_software_name,
> > > > > > etc.).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unless there are further comments I'll call the vote
> > > in a
> > > > > > day or
> > > > > > > > > two.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Magnus
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Den mån 4 okt. 2021 kl 20:57 skrev Magnus Edenhill <
> > > > > > > > > > [email protected]>:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Gwen,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm finishing up the KIP based on the last couple
> > of
> > > > > > discussion
> > > > > > > > > > points
> > > > > > > > > > > > in
> > > > > > > > > > > > > > this thread
> > > > > > > > > > > > > > and will call the Vote later this week.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Magnus
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Den lör 2 okt. 2021 kl 02:01 skrev Gwen Shapira
> > > > > > > > > > > > > <[email protected]
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Hey,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I noticed that there was no discussion for the
> > last
> > > 10
> > > > > > days,
> > > > > > > > > but I
> > > > > > > > > > > > > >> couldn't
> > > > > > > > > > > > > >> find the vote thread. Is there one that I'm
> > missing?
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Gwen
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Wed, Sep 22, 2021 at 4:58 AM Magnus Edenhill <
> > > > > > > > > > [email protected]>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Den tis 21 sep. 2021 kl 06:58 skrev Colin
> > McCabe <
> > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > On Mon, Sep 20, 2021, at 17:35, Feng Min
> > wrote:
> > > > > > > > > > > > > >> > > > Thanks Magnus & Colin for the discussion.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Based on KIP-714's stateless design, Client
> > > can
> > > > > > pretty
> > > > > > > > > much
> > > > > > > > > > use
> > > > > > > > > > > > > any
> > > > > > > > > > > > > >> > > > connection to any broker to send metrics. We
> > > are
> > > > > not
> > > > > > > > > > associating
> > > > > > > > > > > > > >> > > connection
> > > > > > > > > > > > > >> > > > with client metric state. Is my
> > understanding
> > > > > > correct?
> > > > > > > > If
> > > > > > > > > > yes,
> > > > > > > > > > > > > how
> > > > > > > > > > > > > >> > about
> > > > > > > > > > > > > >> > > > the following two scenarios
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > 1) One Client (Client-ID) registers two
> > > > different
> > > > > > client
> > > > > > > > > > > > instance
> > > > > > > > > > > > > id
> > > > > > > > > > > > > >> > via
> > > > > > > > > > > > > >> > > > separate registration. Is it permitted? If
> > OK,
> > > > how
> > > > > > to
> > > > > > > > > > > > distinguish
> > > > > > > > > > > > > >> them
> > > > > > > > > > > > > >> > > from
> > > > > > > > > > > > > >> > > > the case 2 below.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Hi Feng,
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > My understanding, which Magnus can clarify I
> > > > guess,
> > > > > is
> > > > > > > > that
> > > > > > > > > > you
> > > > > > > > > > > > > could
> > > > > > > > > > > > > >> > have
> > > > > > > > > > > > > >> > > something like two Producer instances running
> > > with
> > > > > the
> > > > > > > > same
> > > > > > > > > > > > > client.id
> > > > > > > > > > > > > >> > > (perhaps because they're using the same config
> > > > file,
> > > > > > for
> > > > > > > > > > example).
> > > > > > > > > > > > > >> They
> > > > > > > > > > > > > >> > > could even be in the same process. But they
> > > would
> > > > > get
> > > > > > > > > separate
> > > > > > > > > > > > > UUIDs.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I believe Magnus used the term client to mean
> > > > > > "Producer or
> > > > > > > > > > > > > Consumer".
> > > > > > > > > > > > > >> So
> > > > > > > > > > > > > >> > > if you have both a Producer and a Consumer in
> > > your
> > > > > > > > > > application I
> > > > > > > > > > > > > would
> > > > > > > > > > > > > >> > > expect you'd get separate UUIDs for both.
> > Again
> > > > > > Magnus can
> > > > > > > > > > chime
> > > > > > > > > > > > in
> > > > > > > > > > > > > >> > here, I
> > > > > > > > > > > > > >> > > guess.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > That's correct.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > 2) How about the client restarting? What's
> > the
> > > > > > > > > expectation?
> > > > > > > > > > > > Should
> > > > > > > > > > > > > >> the
> > > > > > > > > > > > > >> > > > server expect the client to carry a
> > persisted
> > > > > client
> > > > > > > > > > instance id
> > > > > > > > > > > > > or
> > > > > > > > > > > > > >> > > should
> > > > > > > > > > > > > >> > > > the client be treated as a new instance?
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > The KIP doesn't describe any mechanism for
> > > > > > persistence,
> > > > > > > > so I
> > > > > > > > > > would
> > > > > > > > > > > > > >> assume
> > > > > > > > > > > > > >> > > that when you restart the client you get a new
> > > > > UUID. I
> > > > > > > > agree
> > > > > > > > > > that
> > > > > > > > > > > > it
> > > > > > > > > > > > > >> > would
> > > > > > > > > > > > > >> > > be good to spell this out.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > Right, it will not be persisted since a client
> > > > > instance
> > > > > > > > can't
> > > > > > > > > be
> > > > > > > > > > > > > >> restarted.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Will update the KIP to make this clearer.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > /Magnus
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> --
> > > > > > > > > > > > > >> Gwen Shapira
> > > > > > > > > > > > > >> Engineering Manager | Confluent
> > > > > > > > > > > > > >> 650.450.2760 | @gwenshap
> > > > > > > > > > > > > >> Follow us: Twitter | blog
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to