Re: [DISCUSS] KIP-714: Client metrics and observability

Kirk True Wed, 09 Mar 2022 15:19:47 -0800

Hi Jun,

On Tue, Mar 8, 2022, at 5:47 PM, Jun Rao wrote:
> Hi, Magnus, Sarat and Xavier,
> 
> Thanks for the reply. A few more comments below.
> 
> 20. It seems that we are piggybacking the plugin on the
> existing MetricsReporter. So, this seems fine.
> 
> 21. That could work. Are we requiring any additional jar dependency on the
> client? Or, are you suggesting that we check the runtime dependency to pick
> the compression codec?


The Java client doesn't require any additional libraries for compression, no.

> 28. For the broker metrics, could you spell out the full metric name
> including groups, tags, etc? We typically don't add the broker_id label for
> broker metrics. Also, brokers use Yammer metrics, which doesn't have type
> Sum.
> 
> 29. There are several client metrics listed as histogram. However, the java
> client currently doesn't support histogram type.

There does appear to be some code related to histograms in the 
org.apache.kafka.common.metrics.stats package. But we're still looking into the 
implementation to see if there's anything needed for KIP-714.

> 30. Could you show an example of the metric payload in PushTelemetryRequest
> to help understand how we organize metrics at different levels (per
> instance, per topic, per partition, per broker, etc)?
> 
> 31. Could you add a bit more detail on which client thread sends the
> PushTelemetryRequest?

Yes, I will add that the KIP.

Thanks,
Kirk

> Thanks,
> 
> Jun
> 
> On Mon, Mar 7, 2022 at 11:48 AM Magnus Edenhill <mag...@edenhill.se> wrote:
> 
> > Hi Jun,
> >
> > thanks for your initiated questions, see my answers below.
> > There's been a number of clarifications to the KIP.
> >
> >
> >
> > Den tors 27 jan. 2022 kl 20:08 skrev Jun Rao <j...@confluent.io.invalid>:
> >
> > > Hi, Magnus,
> > >
> > > Thanks for updating the KIP. The overall approach makes sense to me. A
> > few
> > > more detailed comments below.
> > >
> > > 20. ClientTelemetry: Should it be extending configurable and closable?
> > >
> >
> > I'll pass this question to Sarat and/or Xavier.
> >
> >
> >
> > > 21. Compression of the metrics on the client: what's the default?
> > >
> >
> > How about we specify a prioritized list: zstd, lz4, snappy, gzip?
> > But ultimately it is up to what the client supports.
> >
> >
> > 23. A client instance is considered a metric resource and the
> > > resource-level (thus client instance level) labels could include:
> > >     client_software_name=confluent-kafka-python
> > >     client_software_version=v2.1.3
> > >     client_instance_id=B64CD139-3975-440A-91D4
> > >     transactional_id=someTxnApp
> > > Are those labels added in PushTelemetryRequest? If so, are they per
> > metric
> > > or per request?
> > >
> >
> >
> > client_software* and client_instance_id are not added by the client, but
> > available to
> > the broker-side metrics plugin for adding as it see fits, remove them from
> > the KIP.
> >
> > As for transactional_id, group_id, etc, which I believe will be useful in
> > troubleshooting,
> > are included only once (per push) as resource-level attributes (the client
> > instance is a singular resource).
> >
> >
> > >
> > > 24.  "the broker will only send
> > > GetTelemetrySubscriptionsResponse.DeltaTemporality=True" :
> > > 24.1 If it's always true, does it need to be part of the protocol?
> > >
> >
> > We're anticipating that it will take a lot longer to upgrade the majority
> > of clients than the
> > broker/plugin side, which is why we want the client to support both
> > temporalities out-of-the-box
> > so that cumulative reporting can be turned on seamlessly in the future.
> >
> >
> >
> > > 24.2 Does delta only apply to Counter type?
> > >
> >
> >
> > And Histograms. More details in Xavier's OTLP link.
> >
> >
> >
> > > 24.3 In the delta representation, the first request needs to send the
> > full
> > > value, how does the broker plugin know whether a value is full or delta?
> > >
> >
> > The client may (should) send the start time for each metric sample,
> > indicating when
> > the metric began to be collected.
> > We've discussed whether this should be the client instance start time or
> > the time when a matching
> > metric subscription for that metric is received.
> > For completeness we recommend using the former, the client instance start
> > time.
> >
> >
> >
> > > 25. quota:
> > > 25.1 Since we are fitting PushTelemetryRequest into the existing request
> > > quota, it would be useful to document the impact, i.e. client metric
> > > throttling causes the data from the same client to be delayed.
> > > 25.2 Is PushTelemetryRequest subject to the write bandwidth quota like
> > the
> > > producer?
> > >
> >
> >
> > Yes, it should be, as to protect the cluster from rogue clients.
> > But, in practice the size of metrics will be quite low (e.g., 1-10kb per
> > 60s interval), so I don't think this will pose a problem.
> > The KIP has been updated with more details on quota/throttling behaviour,
> > see the
> > "Throttling and rate-limiting" section.
> >
> >
> > 25.3 THROTTLING_QUOTA_EXCEEDED: Currently, we don't send this error when
> > > the request/bandwidth quota is exceeded since those requests are not
> > > rejected. We only set this error when the request is rejected (e.g.,
> > topic
> > > creation). It would be useful to clarify when this error is used.
> > >
> >
> > Right, I was trying to reuse an existing error-code. We can introduce
> > a new one for the case where a client pushes metrics at a higher frequency
> > than the
> > than the configured push interval (e.g., out-of-profile sends).
> > This causes the broker to drop those metrics and send this error code back
> > to the client. There will be no connection throttling / channel-muting in
> > this
> > case (unless the standard quotas are exceeded).
> >
> >
> > > 27. kafka-client-metrics.sh: Could we add an example on how to disable a
> > > bad client?
> > >
> >
> > There's now a --block option to kafka-client-metrics.sh which overrides all
> > subscriptions
> > for the matched client(s). This allows silencing metrics for one or more
> > clients without having
> > to remove existing subscriptions. From the client's perspective it will
> > look like it no longer has
> > any subscriptions.
> >
> > # Block metrics collection for a specific client instance
> > $ kafka-client-metrics.sh --bootstrap-server $BROKERS \
> >    --add \
> >    --name 'Disabe_b69cc35a' \  # A descriptive name makes it easier to
> > clean up old subscriptions.
> >    --match client_instance_id=b69cc35a-7a54-4790-aa69-cc2bd4ee4538 \  #
> > Match this specific client instance
> >    --block
> >
> >
> >
> >
> > > 28. New broker side metrics: Could we spell out the details of the
> > metrics
> > > (e.g., group, tags, etc)?
> > >
> >
> > KIP has been updated accordingly (thanks Sarat).
> >
> >
> >
> > >
> > > 29. Client instance-level metrics: client.io.wait.time is a gauge not a
> > > histogram.
> > >
> >
> > I believe a population/distribution should preferably be represented as a
> > histogram, space permitting,
> > and only secondarily as a Gauge average.
> > While we might not want to maintain a bunch of histograms for each
> > partition, since that could be
> > quite space consuming, this client.io.wait.time is a single metric per
> > client instance and can
> > thus afford a Histogram representation.
> >
> >
> >
> > Thanks,
> > Magnus
> >
> >
> >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Jan 26, 2022 at 6:32 AM Magnus Edenhill <mag...@edenhill.se>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've updated the KIP with responses to the latest comments: Java client
> > > > dependencies (Thanks Kirk!), alternate designs (separate cluster,
> > > separate
> > > > producer, etc), etc.
> > > >
> > > > I will revive the vote thread.
> > > >
> > > > Thanks,
> > > > Magnus
> > > >
> > > >
> > > > Den mån 13 dec. 2021 kl 22:32 skrev Ryanne Dolan <
> > ryannedo...@gmail.com
> > > >:
> > > >
> > > > > I think we should be very careful about introducing new runtime
> > > > > dependencies into the clients. Historically this has been rare and
> > > > > essentially necessary (e.g. compression libs).
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Mon, Dec 13, 2021, 1:06 PM Kirk True <k...@mustardgrain.com>
> > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > On Thu, Dec 9, 2021, at 2:28 PM, Jun Rao wrote:
> > > > > > > 13. Using OpenTelemetry. Does that require runtime dependency
> > > > > > > on OpenTelemetry library? How good is the compatibility story
> > > > > > > of OpenTelemetry? This is important since an application could
> > have
> > > > > other
> > > > > > > OpenTelemetry dependencies than the Kafka client.
> > > > > >
> > > > > > The current design is that the OpenTelemetry JARs would ship with
> > the
> > > > > > client. Perhaps we can design the client such that the JARs aren't
> > > even
> > > > > > loaded if the user has opted out. The user could even exclude the
> > > JARs
> > > > > from
> > > > > > their dependencies if they so wished.
> > > > > >
> > > > > > I can't speak to the compatibility of the libraries. Is it possible
> > > > that
> > > > > > we include a shaded version?
> > > > > >
> > > > > > Thanks,
> > > > > > Kirk
> > > > > >
> > > > > > >
> > > > > > > 14. The proposal listed idempotence=true. This is more of a
> > > > > configuration
> > > > > > > than a metric. Are we including that as a metric? What other
> > > > > > configurations
> > > > > > > are we including? Should we separate the configurations from the
> > > > > metrics?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Mon, Nov 29, 2021 at 7:34 AM Magnus Edenhill <
> > > mag...@edenhill.se>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Bob,
> > > > > > > >
> > > > > > > > That's a good point.
> > > > > > > >
> > > > > > > > Request type labels were considered but since they're already
> > > > tracked
> > > > > > by
> > > > > > > > broker-side metrics
> > > > > > > > they were left out as to avoid metric duplication, however
> > those
> > > > > > metrics
> > > > > > > > are not per connection,
> > > > > > > > so they won't be that useful in practice for troubleshooting
> > > > specific
> > > > > > > > client instances.
> > > > > > > >
> > > > > > > > I'll add the request_type label to the relevant metrics.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Magnus
> > > > > > > >
> > > > > > > >
> > > > > > > > Den tis 23 nov. 2021 kl 19:20 skrev Bob Barrett
> > > > > > > > <bob.barr...@confluent.io.invalid>:
> > > > > > > >
> > > > > > > > > Hi Magnus,
> > > > > > > > >
> > > > > > > > > Thanks for the thorough KIP, this seems very useful.
> > > > > > > > >
> > > > > > > > > Would it make sense to include the request type as a label
> > for
> > > > the
> > > > > > > > > `client.request.success`, `client.request.errors` and
> > > > > > > > `client.request.rtt`
> > > > > > > > > metrics? I think it would be very useful to see which
> > specific
> > > > > > requests
> > > > > > > > are
> > > > > > > > > succeeding and failing for a client. One specific case I can
> > > > think
> > > > > of
> > > > > > > > where
> > > > > > > > > this could be useful is producer batch timeouts. If a Java
> > > > > > application
> > > > > > > > does
> > > > > > > > > not enable producer client logs (unfortunately, in my
> > > experience
> > > > > this
> > > > > > > > > happens more often than it should), the application logs will
> > > > only
> > > > > > > > contain
> > > > > > > > > the expiration error message, but no information about what
> > is
> > > > > > causing
> > > > > > > > the
> > > > > > > > > timeout. The requests might all be succeeding but taking too
> > > long
> > > > > to
> > > > > > > > > process batches, or metadata requests might be failing, or
> > some
> > > > or
> > > > > > all
> > > > > > > > > produce requests might be failing (if the bootstrap servers
> > are
> > > > > > reachable
> > > > > > > > > from the client but one or more other brokers are not, for
> > > > > example).
> > > > > > If
> > > > > > > > the
> > > > > > > > > cluster operator is able to identify the specific requests
> > that
> > > > are
> > > > > > slow
> > > > > > > > or
> > > > > > > > > failing for a client, they will be better able to diagnose
> > the
> > > > > issue
> > > > > > > > > causing batch timeouts.
> > > > > > > > >
> > > > > > > > > One drawback I can think of is that this will increase the
> > > > > > cardinality of
> > > > > > > > > the request metrics. But any given client is only going to
> > use
> > > a
> > > > > > small
> > > > > > > > > subset of the request types, and since we already have
> > > partition
> > > > > > labels
> > > > > > > > for
> > > > > > > > > the topic-level metrics, I think request labels will still
> > make
> > > > up
> > > > > a
> > > > > > > > > relatively small percentage of the set of metrics.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Bob
> > > > > > > > >
> > > > > > > > > On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass <
> > > > > > > > > viktorsomo...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Magnus,
> > > > > > > > > >
> > > > > > > > > > I think this is a very useful addition. We also have a
> > > similar
> > > > > (but
> > > > > > > > much
> > > > > > > > > > more simplistic) implementation of this. Maybe I missed it
> > in
> > > > the
> > > > > > KIP
> > > > > > > > but
> > > > > > > > > > what about adding metrics about the subscription cache
> > > itself?
> > > > > > That I
> > > > > > > > > think
> > > > > > > > > > would improve its usability and debuggability as we'd be
> > able
> > > > to
> > > > > > see
> > > > > > > > its
> > > > > > > > > > performance, hit/miss rates, eviction counts and others.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Viktor
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill <
> > > > > > mag...@edenhill.se>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Mickael,
> > > > > > > > > > >
> > > > > > > > > > > see inline.
> > > > > > > > > > >
> > > > > > > > > > > Den ons 10 nov. 2021 kl 15:21 skrev Mickael Maison <
> > > > > > > > > > > mickael.mai...@gmail.com
> > > > > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Magnus,
> > > > > > > > > > > >
> > > > > > > > > > > > I see you've addressed some of the points I raised
> > above
> > > > but
> > > > > > some
> > > > > > > > (4,
> > > > > > > > > > > > 5) have not been addressed yet.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Re 4) How will the user/app know metrics are being sent.
> > > > > > > > > > >
> > > > > > > > > > > One possibility is to add a JMX metric (thus for user
> > > > > > consumption)
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > number of metric pushes the
> > > > > > > > > > > client has performed, or perhaps the number of metrics
> > > > > > subscriptions
> > > > > > > > > > > currently being collected.
> > > > > > > > > > > Would that be sufficient?
> > > > > > > > > > >
> > > > > > > > > > > Re 5) Metric sizes and rates
> > > > > > > > > > >
> > > > > > > > > > > A worst case scenario for a producer that is producing to
> > > 50
> > > > > > unique
> > > > > > > > > > topics
> > > > > > > > > > > and emitting all standard metrics yields
> > > > > > > > > > > a serialized size of around 100KB prior to compression,
> > > which
> > > > > > > > > compresses
> > > > > > > > > > > down to about 20-30% of that depending
> > > > > > > > > > > on compression type and topic name uniqueness.
> > > > > > > > > > > The numbers for a consumer would be similar.
> > > > > > > > > > >
> > > > > > > > > > > In practice the number of unique topics would be far
> > less,
> > > > and
> > > > > > the
> > > > > > > > > > > subscription set would typically be for a subset of
> > > metrics.
> > > > > > > > > > > So we're probably closer to 1kb, or less, compressed size
> > > per
> > > > > > client
> > > > > > > > > per
> > > > > > > > > > > push interval.
> > > > > > > > > > >
> > > > > > > > > > > As both the subscription set and push intervals are
> > > > controlled
> > > > > > by the
> > > > > > > > > > > cluster operator it shouldn't be too hard
> > > > > > > > > > > to strike a good balance between metrics overhead and
> > > > > > granularity.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I'm really uneasy with this being enabled by default on
> > > the
> > > > > > client
> > > > > > > > > > > > side. When collecting data, I think the best practice
> > is
> > > to
> > > > > > ensure
> > > > > > > > > > > > users are explicitly enabling it.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Requiring metrics to be explicitly enabled on clients
> > > > severely
> > > > > > > > cripples
> > > > > > > > > > its
> > > > > > > > > > > usability and value.
> > > > > > > > > > >
> > > > > > > > > > > One of the problems that this KIP aims to solve is for
> > > useful
> > > > > > metrics
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > available on demand
> > > > > > > > > > > regardless of the technical expertise of the user. As
> > > Ryanne
> > > > > > points,
> > > > > > > > > out
> > > > > > > > > > a
> > > > > > > > > > > savvy user/organization
> > > > > > > > > > > will typically have metrics collection and monitoring in
> > > > place
> > > > > > > > already,
> > > > > > > > > > and
> > > > > > > > > > > the benefits of this KIP
> > > > > > > > > > > are then more of a common set and format metrics across
> > > > client
> > > > > > > > > > > implementations and languages.
> > > > > > > > > > > But that is not the typical Kafka user in my experience,
> > > > > they're
> > > > > > not
> > > > > > > > > > Kafka
> > > > > > > > > > > experts and they don't have the
> > > > > > > > > > > knowledge of how to best instrument their clients.
> > > > > > > > > > > Having metrics enabled by default for this user base
> > allows
> > > > the
> > > > > > Kafka
> > > > > > > > > > > operators to proactively and reactively
> > > > > > > > > > > monitor and troubleshoot client issues, without the need
> > > for
> > > > > the
> > > > > > less
> > > > > > > > > > savvy
> > > > > > > > > > > user to do anything.
> > > > > > > > > > > It is often too late to tell a user to enable metrics
> > when
> > > > the
> > > > > > > > problem
> > > > > > > > > > has
> > > > > > > > > > > already occurred.
> > > > > > > > > > >
> > > > > > > > > > > Now, to be clear, even though metrics are enabled by
> > > default
> > > > on
> > > > > > > > clients
> > > > > > > > > > it
> > > > > > > > > > > is not enabled by default
> > > > > > > > > > > on the brokers; the Kafka operator needs to build and set
> > > up
> > > > a
> > > > > > > > metrics
> > > > > > > > > > > plugin and add metrics subscriptions
> > > > > > > > > > > before anything is sent from the client.
> > > > > > > > > > > It is opt-out on the clients and opt-in on the broker.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > You mentioned brokers already have
> > > > > > > > > > > > some(most?) of the information contained in metrics, if
> > > so
> > > > > > then why
> > > > > > > > > > > > are we collecting it again? Surely there must be some
> > new
> > > > > > > > information
> > > > > > > > > > > > in the client metrics.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > From the user's perspective the Kafka infrastructure
> > > extends
> > > > > from
> > > > > > > > > > > producer.send() to
> > > > > > > > > > > messages being returned from consumer.poll(), a giant
> > black
> > > > box
> > > > > > where
> > > > > > > > > > > there's a lot going on between those
> > > > > > > > > > > two points. The brokers currently only see what happens
> > > once
> > > > > > those
> > > > > > > > > > requests
> > > > > > > > > > > and messages hits the broker,
> > > > > > > > > > > but as Kafka clients are complex pieces of machinery
> > > there's
> > > > a
> > > > > > myriad
> > > > > > > > > of
> > > > > > > > > > > queues, timers, and state
> > > > > > > > > > > that's critical to the operation and infrastructure
> > that's
> > > > not
> > > > > > > > > currently
> > > > > > > > > > > visible to the operator.
> > > > > > > > > > > Relying on the user to accurately and timely provide this
> > > > > missing
> > > > > > > > > > > information is not generally feasible.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Most of the standard metrics listed in the KIP are data
> > > > points
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > broker does not have.
> > > > > > > > > > > Only a small number of metrics are duplicates (like the
> > > > request
> > > > > > > > counts
> > > > > > > > > > and
> > > > > > > > > > > sizes), but they are included
> > > > > > > > > > > to ease correlation when inspecting these client metrics.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Moreover this is a brand new feature so it's even
> > harder
> > > to
> > > > > > justify
> > > > > > > > > > > > enabling it and forcing onto all our users. If disabled
> > > by
> > > > > > default,
> > > > > > > > > > > > it's relatively easy to enable in a new release if we
> > > > decide
> > > > > > to,
> > > > > > > > but
> > > > > > > > > > > > once enabled by default it's much harder to disable.
> > Also
> > > > > this
> > > > > > > > > feature
> > > > > > > > > > > > will apply to all future metrics we will add.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think maturity of a feature implementation should be
> > the
> > > > > > deciding
> > > > > > > > > > factor,
> > > > > > > > > > > rather than
> > > > > > > > > > > the design of it (which this KIP is). I.e., if the
> > > > > > implementation is
> > > > > > > > > not
> > > > > > > > > > > deemed mature enough
> > > > > > > > > > > for release X.Y it will be disabled.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Overall I think it's an interesting feature but I'd
> > > prefer
> > > > to
> > > > > > be
> > > > > > > > > > > > slightly defensive and see how it works in practice
> > > before
> > > > > > enabling
> > > > > > > > > it
> > > > > > > > > > > > everywhere.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Right, and I agree on being defensive, but since this
> > > feature
> > > > > > still
> > > > > > > > > > > requires manual
> > > > > > > > > > > enabling on the brokers before actually being used, I
> > think
> > > > > that
> > > > > > > > gives
> > > > > > > > > > > enough control
> > > > > > > > > > > to opt-in or out of this feature as needed.
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your comments!
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Magnus
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Mickael
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Nov 8, 2021 at 7:55 AM Magnus Edenhill <
> > > > > > mag...@edenhill.se
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks David for pointing this out,
> > > > > > > > > > > > > I've updated the KIP to include client_id as a
> > matching
> > > > > > selector.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Magnus
> > > > > > > > > > > > >
> > > > > > > > > > > > > Den tors 4 nov. 2021 kl 18:01 skrev David Mao
> > > > > > > > > > > <d...@confluent.io.invalid
> > > > > > > > > > > > >:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Magnus,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I noticed that the KIP outlines the initial
> > selectors
> > > > > > supported
> > > > > > > > > as:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - client_instance_id - CLIENT_INSTANCE_ID UUID
> > > > string
> > > > > > > > > > > > representation.
> > > > > > > > > > > > > >    - client_software_name  - client software
> > > > > implementation
> > > > > > > > name.
> > > > > > > > > > > > > >    - client_software_version  - client software
> > > > > > implementation
> > > > > > > > > > > version.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In the given reactive monitoring workflow, we
> > mention
> > > > > that
> > > > > > the
> > > > > > > > > > > > application
> > > > > > > > > > > > > > user does not know their client's client instance
> > ID,
> > > > but
> > > > > > it's
> > > > > > > > > > > outlined
> > > > > > > > > > > > > > that the operator can add a metrics subscription
> > > > > selecting
> > > > > > for
> > > > > > > > > > > > clientId. I
> > > > > > > > > > > > > > don't see clientId as one of the supported
> > selectors.
> > > > > > > > > > > > > > I can see how this would have made sense in a
> > > previous
> > > > > > > > iteration
> > > > > > > > > > > given
> > > > > > > > > > > > that
> > > > > > > > > > > > > > the previous client instance ID proposal was to
> > > > construct
> > > > > > the
> > > > > > > > > > client
> > > > > > > > > > > > > > instance ID using clientId as a prefix. Now that
> > the
> > > > > client
> > > > > > > > > > instance
> > > > > > > > > > > > ID is
> > > > > > > > > > > > > > a UUID, would we want to add clientId as a
> > supported
> > > > > > selector?
> > > > > > > > > > > > > > Let me know what you think.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > David
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 12:33 PM Magnus Edenhill <
> > > > > > > > > > mag...@edenhill.se
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Mickael!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Den tis 19 okt. 2021 kl 19:30 skrev Mickael
> > Maison
> > > <
> > > > > > > > > > > > > > > mickael.mai...@gmail.com
> > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Magnus,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the proposal.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. Looking at the protocol section, isn't
> > > > > > > > "ClientInstanceId"
> > > > > > > > > > > > expected
> > > > > > > > > > > > > > > > to be a field in
> > > > GetTelemetrySubscriptionsResponseV0?
> > > > > > > > > > Otherwise,
> > > > > > > > > > > > how
> > > > > > > > > > > > > > > > does a client retrieve this value?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Good catch, it got removed by mistake in one of
> > the
> > > > > > edits.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. In the client API section, you mention a new
> > > > > method
> > > > > > > > > > > > > > > > "clientInstanceId()". Can you clarify which
> > > > > interfaces
> > > > > > are
> > > > > > > > > > > > affected?
> > > > > > > > > > > > > > > > Is it only Consumer and Producer?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > And Admin. Will update the KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. I'm a bit concerned this is enabled by
> > > default.
> > > > > > Even if
> > > > > > > > > the
> > > > > > > > > > > data
> > > > > > > > > > > > > > > > collected is supposed to be not sensitive, I
> > > think
> > > > > > this can
> > > > > > > > > be
> > > > > > > > > > > > > > > > problematic in some environments. Also users
> > > don't
> > > > > > seem to
> > > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > choice to only expose some metrics. Knowing how
> > > > much
> > > > > > data
> > > > > > > > > > transit
> > > > > > > > > > > > > > > > through some applications can be considered
> > > > critical.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The broker already knows how much data transits
> > > > through
> > > > > > the
> > > > > > > > > > client
> > > > > > > > > > > > > > though,
> > > > > > > > > > > > > > > right?
> > > > > > > > > > > > > > > Care has been taken not to expose information in
> > > the
> > > > > > standard
> > > > > > > > > > > metrics
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > reveal sensitive information.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Do you have an example of how the proposed
> > metrics
> > > > > could
> > > > > > leak
> > > > > > > > > > > > sensitive
> > > > > > > > > > > > > > > information?
> > > > > > > > > > > > > > > As for limiting the what metrics to export; I
> > guess
> > > > > that
> > > > > > > > could
> > > > > > > > > > make
> > > > > > > > > > > > sense
> > > > > > > > > > > > > > > in some
> > > > > > > > > > > > > > > very sensitive use-cases, but those users might
> > > > disable
> > > > > > > > metrics
> > > > > > > > > > > > > > altogether
> > > > > > > > > > > > > > > for now.
> > > > > > > > > > > > > > > Could these concerns be addressed by a later KIP?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. As a user, how do you know if your
> > application
> > > > is
> > > > > > > > actively
> > > > > > > > > > > > sending
> > > > > > > > > > > > > > > > metrics? Are there new metrics exposing what's
> > > > going
> > > > > > on,
> > > > > > > > like
> > > > > > > > > > how
> > > > > > > > > > > > much
> > > > > > > > > > > > > > > > data is being sent?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That's a good question.
> > > > > > > > > > > > > > > Since the proposed metrics interface is not aimed
> > > at,
> > > > > or
> > > > > > > > > directly
> > > > > > > > > > > > > > available
> > > > > > > > > > > > > > > to, the application
> > > > > > > > > > > > > > > I guess there's little point of adding it here,
> > but
> > > > > > instead
> > > > > > > > > > adding
> > > > > > > > > > > > > > > something to the
> > > > > > > > > > > > > > > existing JMX metrics?
> > > > > > > > > > > > > > > Do you have any suggestions?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 5. If all metrics are enabled on a regular
> > > Consumer
> > > > > or
> > > > > > > > > > Producer,
> > > > > > > > > > > do
> > > > > > > > > > > > > > > > you have an idea how much throughput this would
> > > > use?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It depends on the number of partition/topics/etc
> > > the
> > > > > > client
> > > > > > > > is
> > > > > > > > > > > > producing
> > > > > > > > > > > > > > > to/consuming from.
> > > > > > > > > > > > > > > I'll add some sizes to the KIP for some typical
> > > > > > use-cases.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Magnus
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 5:06 PM Magnus
> > Edenhill <
> > > > > > > > > > > > mag...@edenhill.se>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Den tis 19 okt. 2021 kl 13:22 skrev Tom
> > > Bentley <
> > > > > > > > > > > > tbent...@redhat.com
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Magnus,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I reviewed the KIP since you called the
> > vote
> > > > > > (sorry for
> > > > > > > > > not
> > > > > > > > > > > > > > reviewing
> > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > you announced your intention to call the
> > > > vote). I
> > > > > > have
> > > > > > > > a
> > > > > > > > > > few
> > > > > > > > > > > > > > > questions
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > some of the details.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. There's no Javadoc on
> > > > > > ClientTelemetryPayload.data(),
> > > > > > > > > so
> > > > > > > > > > I
> > > > > > > > > > > > don't
> > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > > whether the payload is exposed through this
> > > > > method
> > > > > > as
> > > > > > > > > > > > compressed or
> > > > > > > > > > > > > > > > not.
> > > > > > > > > > > > > > > > > > Later on you say "Decompression of the
> > > payloads
> > > > > > will be
> > > > > > > > > > > > handled by
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > broker metrics plugin, the broker should
> > > > expose a
> > > > > > > > > suitable
> > > > > > > > > > > > > > > > decompression
> > > > > > > > > > > > > > > > > > API to the metrics plugin for this
> > purpose.",
> > > > > which
> > > > > > > > > > suggests
> > > > > > > > > > > > it's
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > compressed data in the buffer, but then we
> > > > don't
> > > > > > know
> > > > > > > > > which
> > > > > > > > > > > > codec
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > used,
> > > > > > > > > > > > > > > > > > nor the API via which the plugin should
> > > > > decompress
> > > > > > it
> > > > > > > > if
> > > > > > > > > > > > required
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > forwarding to the ultimate metrics store.
> > > > Should
> > > > > > the
> > > > > > > > > > > > > > > > ClientTelemetryPayload
> > > > > > > > > > > > > > > > > > expose a method to get the compression and
> > a
> > > > > > > > > decompressor?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Good point, updated.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. The client-side API is expressed as
> > > > > > StringOrError
> > > > > > > > > > > > > > > > > > ClientInstance::ClientInstanceId(int
> > > > > timeout_ms). I
> > > > > > > > > > > understand
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > you're
> > > > > > > > > > > > > > > > > > thinking about the librdkafka
> > implementation,
> > > > but
> > > > > > it
> > > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > good
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > show
> > > > > > > > > > > > > > > > > > the API as it would appear on the Apache
> > > Kafka
> > > > > > clients.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This was meant as pseudo-code, but I changed
> > it
> > > > to
> > > > > > Java.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 3. "PushTelemetryRequest|Response -
> > protocol
> > > > > > request
> > > > > > > > used
> > > > > > > > > > by
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > client to
> > > > > > > > > > > > > > > > > > send metrics to any broker it is connected
> > > to."
> > > > > To
> > > > > > be
> > > > > > > > > > clear,
> > > > > > > > > > > > this
> > > > > > > > > > > > > > > means
> > > > > > > > > > > > > > > > > > that the client can choose any of the
> > > connected
> > > > > > brokers
> > > > > > > > > and
> > > > > > > > > > > > push to
> > > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > > one of them? What should a supporting
> > client
> > > do
> > > > > if
> > > > > > it
> > > > > > > > > gets
> > > > > > > > > > an
> > > > > > > > > > > > error
> > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > pushing metrics to a broker, retry sending
> > to
> > > > the
> > > > > > same
> > > > > > > > > > broker
> > > > > > > > > > > > or
> > > > > > > > > > > > > > try
> > > > > > > > > > > > > > > > > > pushing to another broker, or drop the
> > > metrics?
> > > > > > Should
> > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > clients
> > > > > > > > > > > > > > > > > > send successive requests to a single
> > broker,
> > > or
> > > > > > round
> > > > > > > > > > robin,
> > > > > > > > > > > > or is
> > > > > > > > > > > > > > > > that up
> > > > > > > > > > > > > > > > > > to the client author? I'm guessing the
> > > > behaviour
> > > > > > should
> > > > > > > > > be
> > > > > > > > > > > > sticky
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > support the rate limiting features, but I
> > > think
> > > > > it
> > > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > > good
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > authors if this section were explicit on
> > the
> > > > > > > > recommended
> > > > > > > > > > > > behaviour.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > You are right, I've updated the KIP to make
> > > this
> > > > > > clearer.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 4. "Mapping the client instance id to an
> > > actual
> > > > > > > > > application
> > > > > > > > > > > > > > instance
> > > > > > > > > > > > > > > > > > running on a (virtual) machine can be done
> > by
> > > > > > > > inspecting
> > > > > > > > > > the
> > > > > > > > > > > > > > metrics
> > > > > > > > > > > > > > > > > > resource labels, such as the client source
> > > > > address
> > > > > > and
> > > > > > > > > > source
> > > > > > > > > > > > port,
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > security principal, all of which are added
> > by
> > > > the
> > > > > > > > > receiving
> > > > > > > > > > > > broker.
> > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > will allow the operator together with the
> > > user
> > > > to
> > > > > > > > > identify
> > > > > > > > > > > the
> > > > > > > > > > > > > > actual
> > > > > > > > > > > > > > > > > > application instance." Is this really
> > always
> > > > > true?
> > > > > > The
> > > > > > > > > > source
> > > > > > > > > > > > IP
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > port
> > > > > > > > > > > > > > > > > > might be a loadbalancer/proxy in some
> > setups.
> > > > The
> > > > > > > > > > principal,
> > > > > > > > > > > as
> > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > mentioned in the KIP, might be shared
> > between
> > > > > > multiple
> > > > > > > > > > > > > > applications.
> > > > > > > > > > > > > > > > So at
> > > > > > > > > > > > > > > > > > worst the organization running the clients
> > > > might
> > > > > > have
> > > > > > > > to
> > > > > > > > > > > > consult
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > logs
> > > > > > > > > > > > > > > > > > of a set of client applications, right?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yes, that's correct. There's no guaranteed
> > > > mapping
> > > > > > from
> > > > > > > > > > > > > > > > client_instance_id
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > an actual instance, that's why the KIP
> > > recommends
> > > > > > client
> > > > > > > > > > > > > > > implementations
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > log the client instance id
> > > > > > > > > > > > > > > > > upon retrieval, and also provide an API for
> > the
> > > > > > > > application
> > > > > > > > > > to
> > > > > > > > > > > > > > retrieve
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > instance id programmatically
> > > > > > > > > > > > > > > > > if it has a better way of exposing it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 5. "Tests indicate that a compression ratio
> > up
> > > to
> > > > > > 10x is
> > > > > > > > > > > > possible for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > standard metrics." Client authors might
> > > > > appreciate
> > > > > > your
> > > > > > > > > > > > mentioning
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > compression codec got these results.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Good point. Updated.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 6. "Should the client send a push request
> > > prior
> > > > > to
> > > > > > > > expiry
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > previously
> > > > > > > > > > > > > > > > > > calculated PushIntervalMs the broker will
> > > > discard
> > > > > > the
> > > > > > > > > > metrics
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > return a
> > > > > > > > > > > > > > > > > > PushTelemetryResponse with the ErrorCode
> > set
> > > to
> > > > > > > > > > RateLimited."
> > > > > > > > > > > > Is
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > RATE_LIMITED a new error code? It's not
> > > > mentioned
> > > > > > in
> > > > > > > > the
> > > > > > > > > > "New
> > > > > > > > > > > > Error
> > > > > > > > > > > > > > > > Codes"
> > > > > > > > > > > > > > > > > > section.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > That's a leftover, it should be using the
> > > > standard
> > > > > > > > > > ThrottleTime
> > > > > > > > > > > > > > > > mechanism.
> > > > > > > > > > > > > > > > > Fixed.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 7. In the section "Standard client resource
> > > > > labels"
> > > > > > > > > > > > application_id
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > described as Kafka Streams only, but the
> > > > section
> > > > > of
> > > > > > > > > "Client
> > > > > > > > > > > > > > > > Identification"
> > > > > > > > > > > > > > > > > > talks about "application instance id as an
> > > > > optional
> > > > > > > > > future
> > > > > > > > > > > > > > > nice-to-have
> > > > > > > > > > > > > > > > > > that may be included as a metrics label if
> > it
> > > > has
> > > > > > been
> > > > > > > > > set
> > > > > > > > > > by
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > user", so
> > > > > > > > > > > > > > > > > > I'm confused whether non-Kafka Streams
> > > clients
> > > > > > should
> > > > > > > > set
> > > > > > > > > > an
> > > > > > > > > > > > > > > > application_id
> > > > > > > > > > > > > > > > > > or not.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'll clarify this in the KIP, but basically
> > we
> > > > > would
> > > > > > need
> > > > > > > > > to
> > > > > > > > > > > add
> > > > > > > > > > > > an `
> > > > > > > > > > > > > > > > > application.id` config
> > > > > > > > > > > > > > > > > property for non-streams clients for this
> > > > purpose,
> > > > > > and
> > > > > > > > > that's
> > > > > > > > > > > > outside
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > scope of this KIP since we want to make it
> > > > > > zero-conf:ish
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > side.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Kind regards,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the review,
> > > > > > > > > > > > > > > > > Magnus
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Oct 7, 2021 at 5:26 PM Magnus
> > > Edenhill
> > > > <
> > > > > > > > > > > > mag...@edenhill.se
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I've updated the KIP following our recent
> > > > > > discussions
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > > > mailing
> > > > > > > > > > > > > > > > > > list:
> > > > > > > > > > > > > > > > > > >  - split the protocol in two, one for
> > > getting
> > > > > the
> > > > > > > > > metrics
> > > > > > > > > > > > > > > > subscriptions,
> > > > > > > > > > > > > > > > > > > and one for pushing the metrics.
> > > > > > > > > > > > > > > > > > >  - simplifications: initially only one
> > > > > supported
> > > > > > > > > metrics
> > > > > > > > > > > > format,
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > client.id in the instance id, etc.
> > > > > > > > > > > > > > > > > > >  - made CLIENT_METRICS subscription
> > > > > configuration
> > > > > > > > > entries
> > > > > > > > > > > > more
> > > > > > > > > > > > > > > > structured
> > > > > > > > > > > > > > > > > > >    and allowing better client matching
> > > > > selectors
> > > > > > (not
> > > > > > > > > > only
> > > > > > > > > > > > on the
> > > > > > > > > > > > > > > > > > instance
> > > > > > > > > > > > > > > > > > > id, but also the other
> > > > > > > > > > > > > > > > > > >    client resource labels, such as
> > > > > > > > > client_software_name,
> > > > > > > > > > > > etc.).
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Unless there are further comments I'll
> > call
> > > > the
> > > > > > vote
> > > > > > > > > in a
> > > > > > > > > > > > day or
> > > > > > > > > > > > > > > two.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > > Magnus
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Den mån 4 okt. 2021 kl 20:57 skrev Magnus
> > > > > > Edenhill <
> > > > > > > > > > > > > > > > mag...@edenhill.se>:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Gwen,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I'm finishing up the KIP based on the
> > > last
> > > > > > couple
> > > > > > > > of
> > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > points
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > this thread
> > > > > > > > > > > > > > > > > > > > and will call the Vote later this week.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Magnus
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Den lör 2 okt. 2021 kl 02:01 skrev Gwen
> > > > > Shapira
> > > > > > > > > > > > > > > > > > > <g...@confluent.io.invalid
> > > > > > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >> Hey,
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >> I noticed that there was no discussion
> > > for
> > > > > the
> > > > > > > > last
> > > > > > > > > 10
> > > > > > > > > > > > days,
> > > > > > > > > > > > > > > but I
> > > > > > > > > > > > > > > > > > > >> couldn't
> > > > > > > > > > > > > > > > > > > >> find the vote thread. Is there one
> > that
> > > > I'm
> > > > > > > > missing?
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >> Gwen
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >> On Wed, Sep 22, 2021 at 4:58 AM Magnus
> > > > > > Edenhill <
> > > > > > > > > > > > > > > > mag...@edenhill.se>
> > > > > > > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >> > Den tis 21 sep. 2021 kl 06:58 skrev
> > > > Colin
> > > > > > > > McCabe <
> > > > > > > > > > > > > > > > > > cmcc...@apache.org
> > > > > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >> > > On Mon, Sep 20, 2021, at 17:35,
> > Feng
> > > > Min
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >> > > > Thanks Magnus & Colin for the
> > > > > > discussion.
> > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > >> > > > Based on KIP-714's stateless
> > > design,
> > > > > > Client
> > > > > > > > > can
> > > > > > > > > > > > pretty
> > > > > > > > > > > > > > > much
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > > > > >> > > > connection to any broker to send
> > > > > > metrics. We
> > > > > > > > > are
> > > > > > > > > > > not
> > > > > > > > > > > > > > > > associating
> > > > > > > > > > > > > > > > > > > >> > > connection
> > > > > > > > > > > > > > > > > > > >> > > > with client metric state. Is my
> > > > > > > > understanding
> > > > > > > > > > > > correct?
> > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > yes,
> > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > >> > about
> > > > > > > > > > > > > > > > > > > >> > > > the following two scenarios
> > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > >> > > > 1) One Client (Client-ID)
> > > registers
> > > > > two
> > > > > > > > > > different
> > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > instance
> > > > > > > > > > > > > > > > > > > id
> > > > > > > > > > > > > > > > > > > >> > via
> > > > > > > > > > > > > > > > > > > >> > > > separate registration. Is it
> > > > > permitted?
> > > > > > If
> > > > > > > > OK,
> > > > > > > > > > how
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > distinguish
> > > > > > > > > > > > > > > > > > > >> them
> > > > > > > > > > > > > > > > > > > >> > > from
> > > > > > > > > > > > > > > > > > > >> > > > the case 2 below.
> > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > > Hi Feng,
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > > My understanding, which Magnus can
> > > > > > clarify I
> > > > > > > > > > guess,
> > > > > > > > > > > is
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > >> > have
> > > > > > > > > > > > > > > > > > > >> > > something like two Producer
> > > instances
> > > > > > running
> > > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > client.id
> > > > > > > > > > > > > > > > > > > >> > > (perhaps because they're using the
> > > > same
> > > > > > config
> > > > > > > > > > file,
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > example).
> > > > > > > > > > > > > > > > > > > >> They
> > > > > > > > > > > > > > > > > > > >> > > could even be in the same process.
> > > But
> > > > > > they
> > > > > > > > > would
> > > > > > > > > > > get
> > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > UUIDs.
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > > I believe Magnus used the term
> > > client
> > > > to
> > > > > > mean
> > > > > > > > > > > > "Producer or
> > > > > > > > > > > > > > > > > > > Consumer".
> > > > > > > > > > > > > > > > > > > >> So
> > > > > > > > > > > > > > > > > > > >> > > if you have both a Producer and a
> > > > > > Consumer in
> > > > > > > > > your
> > > > > > > > > > > > > > > > application I
> > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > >> > > expect you'd get separate UUIDs
> > for
> > > > > both.
> > > > > > > > Again
> > > > > > > > > > > > Magnus can
> > > > > > > > > > > > > > > > chime
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > >> > here, I
> > > > > > > > > > > > > > > > > > > >> > > guess.
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >> > That's correct.
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > > > 2) How about the client
> > > restarting?
> > > > > > What's
> > > > > > > > the
> > > > > > > > > > > > > > > expectation?
> > > > > > > > > > > > > > > > > > Should
> > > > > > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > > > > > >> > > > server expect the client to
> > carry
> > > a
> > > > > > > > persisted
> > > > > > > > > > > client
> > > > > > > > > > > > > > > > instance id
> > > > > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > > >> > > should
> > > > > > > > > > > > > > > > > > > >> > > > the client be treated as a new
> > > > > instance?
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > > The KIP doesn't describe any
> > > mechanism
> > > > > for
> > > > > > > > > > > > persistence,
> > > > > > > > > > > > > > so I
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > >> assume
> > > > > > > > > > > > > > > > > > > >> > > that when you restart the client
> > you
> > > > get
> > > > > > a new
> > > > > > > > > > > UUID. I
> > > > > > > > > > > > > > agree
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > >> > would
> > > > > > > > > > > > > > > > > > > >> > > be good to spell this out.
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > >> > Right, it will not be persisted
> > since
> > > a
> > > > > > client
> > > > > > > > > > > instance
> > > > > > > > > > > > > > can't
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > >> restarted.
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >> > Will update the KIP to make this
> > > > clearer.
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >> > /Magnus
> > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >> --
> > > > > > > > > > > > > > > > > > > >> Gwen Shapira
> > > > > > > > > > > > > > > > > > > >> Engineering Manager | Confluent
> > > > > > > > > > > > > > > > > > > >> 650.450.2760 | @gwenshap
> > > > > > > > > > > > > > > > > > > >> Follow us: Twitter | blog
> > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to