Hey Jun, see response inline:
Den mån 21 mars 2022 kl 19:31 skrev Jun Rao <j...@confluent.io.invalid>: > Hi, Kirk, Sarat, > > A few more comments. > > 40. GetTelemetrySubscriptionsResponseV0 : RequestedMetrics Array[string] > uses "Array[0] empty string" to represent all metrics subscribed. We had a > similar issue with the topics field in MetadataRequest and used the > following convention. > In version 1 and higher, an empty array indicates "request metadata for no > topics," and a null array is used to indicate "request metadata for all > topics." > Should we use the same convention in GetTelemetrySubscriptionsResponseV0? > Right, I considered this but chose the current design because the subscriptions are prefix-matched, so an empty string will automatically match everything. It is not critical in any way, so if you feel it is better to follow the way MetadataRequest does it, I can change it? > > 41. We include CompressionType in PushTelemetryRequestV0, but not in > ClientTelemetryPayload. How would the implementer know the compression type > for the telemetry payload? > > The PushTelemetryRequest handler decompresses the payload before passing it to the metrics plugin. This was done to avoid having to expose a public decompression interface to metrics plugin developers. > 42. For blocking the metrics for certain clients in the following example, > could you describe the corresponding config value used through the > kafka-config command? > kafka-client-metrics.sh --bootstrap-server $BROKERS \ > --add \ > --name 'Disabe_b69cc35a' \ # A descriptive name makes it easier to > clean up old subscriptions. > --match client_instance_id=b69cc35a-7a54-4790-aa69-cc2bd4ee4538 \ # > Match this specific client instance > --block > --block will set the "inteval" ConfigEntry to "0", which overrides and disables all accumulated subscriptions for the matching client instance. Thanks, Magnus > On Thu, Mar 10, 2022 at 11:57 AM Jun Rao <j...@confluent.io> wrote: > > > Hi, Kirk, Sarat, > > > > Thanks for the reply. > > > > 28. On the broker, we typically use Yammer metrics. Only for metrics that > > depend on Kafka metric features (e.g., quota), we use the Kafka metric. > > Yammer metrics have 4 types: gauge, meter, histogram and timer. meter > > calculates a rate, but also exposes an accumulated value. > > > > 29. The Histogram class in org.apache.kafka.common.metrics.stats was > never > > used in the client metrics. The implementation of Histogram only > provides a > > fixed number of values in the domain and may not capture the quantiles > very > > accurately. So, we punted on using it. > > > > Thanks, > > > > Jun > > > > > > > > On Thu, Mar 10, 2022 at 10:59 AM Sarat Kakarla > > <skaka...@confluent.io.invalid> wrote: > > > >> Jun, > >> > >> >> 28. For the broker metrics, could you spell out the full metric > name > >> >> including groups, tags, etc? We typically don't add the broker_id > >> label for > >> >> broker metrics. Also, brokers use Yammer metrics, which doesn't > >> have type > >> >> Sum. > >> > >> Sure, I will update the KIP-714 with the above information, will remove > >> the broker-id label from the metrics. > >> > >> Regarding the type is CumulativeSum the right type to use in the place > of > >> Sum? > >> > >> Thanks > >> Sarat > >> > >> > >> On 3/8/22, 5:48 PM, "Jun Rao" <j...@confluent.io.INVALID> wrote: > >> > >> Hi, Magnus, Sarat and Xavier, > >> > >> Thanks for the reply. A few more comments below. > >> > >> 20. It seems that we are piggybacking the plugin on the > >> existing MetricsReporter. So, this seems fine. > >> > >> 21. That could work. Are we requiring any additional jar dependency > >> on the > >> client? Or, are you suggesting that we check the runtime dependency > >> to pick > >> the compression codec? > >> > >> 28. For the broker metrics, could you spell out the full metric name > >> including groups, tags, etc? We typically don't add the broker_id > >> label for > >> broker metrics. Also, brokers use Yammer metrics, which doesn't have > >> type > >> Sum. > >> > >> 29. There are several client metrics listed as histogram. However, > >> the java > >> client currently doesn't support histogram type. > >> > >> 30. Could you show an example of the metric payload in > >> PushTelemetryRequest > >> to help understand how we organize metrics at different levels (per > >> instance, per topic, per partition, per broker, etc)? > >> > >> 31. Could you add a bit more detail on which client thread sends the > >> PushTelemetryRequest? > >> > >> Thanks, > >> > >> Jun > >> > >> On Mon, Mar 7, 2022 at 11:48 AM Magnus Edenhill <mag...@edenhill.se > > > >> wrote: > >> > >> > Hi Jun, > >> > > >> > thanks for your initiated questions, see my answers below. > >> > There's been a number of clarifications to the KIP. > >> > > >> > > >> > > >> > Den tors 27 jan. 2022 kl 20:08 skrev Jun Rao > >> <j...@confluent.io.invalid>: > >> > > >> > > Hi, Magnus, > >> > > > >> > > Thanks for updating the KIP. The overall approach makes sense to > >> me. A > >> > few > >> > > more detailed comments below. > >> > > > >> > > 20. ClientTelemetry: Should it be extending configurable and > >> closable? > >> > > > >> > > >> > I'll pass this question to Sarat and/or Xavier. > >> > > >> > > >> > > >> > > 21. Compression of the metrics on the client: what's the > default? > >> > > > >> > > >> > How about we specify a prioritized list: zstd, lz4, snappy, gzip? > >> > But ultimately it is up to what the client supports. > >> > > >> > > >> > 23. A client instance is considered a metric resource and the > >> > > resource-level (thus client instance level) labels could > include: > >> > > client_software_name=confluent-kafka-python > >> > > client_software_version=v2.1.3 > >> > > client_instance_id=B64CD139-3975-440A-91D4 > >> > > transactional_id=someTxnApp > >> > > Are those labels added in PushTelemetryRequest? If so, are they > >> per > >> > metric > >> > > or per request? > >> > > > >> > > >> > > >> > client_software* and client_instance_id are not added by the > >> client, but > >> > available to > >> > the broker-side metrics plugin for adding as it see fits, remove > >> them from > >> > the KIP. > >> > > >> > As for transactional_id, group_id, etc, which I believe will be > >> useful in > >> > troubleshooting, > >> > are included only once (per push) as resource-level attributes > (the > >> client > >> > instance is a singular resource). > >> > > >> > > >> > > > >> > > 24. "the broker will only send > >> > > GetTelemetrySubscriptionsResponse.DeltaTemporality=True" : > >> > > 24.1 If it's always true, does it need to be part of the > protocol? > >> > > > >> > > >> > We're anticipating that it will take a lot longer to upgrade the > >> majority > >> > of clients than the > >> > broker/plugin side, which is why we want the client to support > both > >> > temporalities out-of-the-box > >> > so that cumulative reporting can be turned on seamlessly in the > >> future. > >> > > >> > > >> > > >> > > 24.2 Does delta only apply to Counter type? > >> > > > >> > > >> > > >> > And Histograms. More details in Xavier's OTLP link. > >> > > >> > > >> > > >> > > 24.3 In the delta representation, the first request needs to > send > >> the > >> > full > >> > > value, how does the broker plugin know whether a value is full > or > >> delta? > >> > > > >> > > >> > The client may (should) send the start time for each metric > sample, > >> > indicating when > >> > the metric began to be collected. > >> > We've discussed whether this should be the client instance start > >> time or > >> > the time when a matching > >> > metric subscription for that metric is received. > >> > For completeness we recommend using the former, the client > instance > >> start > >> > time. > >> > > >> > > >> > > >> > > 25. quota: > >> > > 25.1 Since we are fitting PushTelemetryRequest into the existing > >> request > >> > > quota, it would be useful to document the impact, i.e. client > >> metric > >> > > throttling causes the data from the same client to be delayed. > >> > > 25.2 Is PushTelemetryRequest subject to the write bandwidth > quota > >> like > >> > the > >> > > producer? > >> > > > >> > > >> > > >> > Yes, it should be, as to protect the cluster from rogue clients. > >> > But, in practice the size of metrics will be quite low (e.g., > >> 1-10kb per > >> > 60s interval), so I don't think this will pose a problem. > >> > The KIP has been updated with more details on quota/throttling > >> behaviour, > >> > see the > >> > "Throttling and rate-limiting" section. > >> > > >> > > >> > 25.3 THROTTLING_QUOTA_EXCEEDED: Currently, we don't send this > error > >> when > >> > > the request/bandwidth quota is exceeded since those requests are > >> not > >> > > rejected. We only set this error when the request is rejected > >> (e.g., > >> > topic > >> > > creation). It would be useful to clarify when this error is > used. > >> > > > >> > > >> > Right, I was trying to reuse an existing error-code. We can > >> introduce > >> > a new one for the case where a client pushes metrics at a higher > >> frequency > >> > than the > >> > than the configured push interval (e.g., out-of-profile sends). > >> > This causes the broker to drop those metrics and send this error > >> code back > >> > to the client. There will be no connection throttling / > >> channel-muting in > >> > this > >> > case (unless the standard quotas are exceeded). > >> > > >> > > >> > > 27. kafka-client-metrics.sh: Could we add an example on how to > >> disable a > >> > > bad client? > >> > > > >> > > >> > There's now a --block option to kafka-client-metrics.sh which > >> overrides all > >> > subscriptions > >> > for the matched client(s). This allows silencing metrics for one > or > >> more > >> > clients without having > >> > to remove existing subscriptions. From the client's perspective it > >> will > >> > look like it no longer has > >> > any subscriptions. > >> > > >> > # Block metrics collection for a specific client instance > >> > $ kafka-client-metrics.sh --bootstrap-server $BROKERS \ > >> > --add \ > >> > --name 'Disabe_b69cc35a' \ # A descriptive name makes it > easier > >> to > >> > clean up old subscriptions. > >> > --match client_instance_id=b69cc35a-7a54-4790-aa69-cc2bd4ee4538 > >> \ # > >> > Match this specific client instance > >> > --block > >> > > >> > > >> > > >> > > >> > > 28. New broker side metrics: Could we spell out the details of > the > >> > metrics > >> > > (e.g., group, tags, etc)? > >> > > > >> > > >> > KIP has been updated accordingly (thanks Sarat). > >> > > >> > > >> > > >> > > > >> > > 29. Client instance-level metrics: client.io.wait.time is a > gauge > >> not a > >> > > histogram. > >> > > > >> > > >> > I believe a population/distribution should preferably be > >> represented as a > >> > histogram, space permitting, > >> > and only secondarily as a Gauge average. > >> > While we might not want to maintain a bunch of histograms for each > >> > partition, since that could be > >> > quite space consuming, this client.io.wait.time is a single metric > >> per > >> > client instance and can > >> > thus afford a Histogram representation. > >> > > >> > > >> > > >> > Thanks, > >> > Magnus > >> > > >> > > >> > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > On Wed, Jan 26, 2022 at 6:32 AM Magnus Edenhill < > >> mag...@edenhill.se> > >> > > wrote: > >> > > > >> > > > Hi all, > >> > > > > >> > > > I've updated the KIP with responses to the latest comments: > >> Java client > >> > > > dependencies (Thanks Kirk!), alternate designs (separate > >> cluster, > >> > > separate > >> > > > producer, etc), etc. > >> > > > > >> > > > I will revive the vote thread. > >> > > > > >> > > > Thanks, > >> > > > Magnus > >> > > > > >> > > > > >> > > > Den mån 13 dec. 2021 kl 22:32 skrev Ryanne Dolan < > >> > ryannedo...@gmail.com > >> > > >: > >> > > > > >> > > > > I think we should be very careful about introducing new > >> runtime > >> > > > > dependencies into the clients. Historically this has been > >> rare and > >> > > > > essentially necessary (e.g. compression libs). > >> > > > > > >> > > > > Ryanne > >> > > > > > >> > > > > On Mon, Dec 13, 2021, 1:06 PM Kirk True < > >> k...@mustardgrain.com> > >> > wrote: > >> > > > > > >> > > > > > Hi Jun, > >> > > > > > > >> > > > > > On Thu, Dec 9, 2021, at 2:28 PM, Jun Rao wrote: > >> > > > > > > 13. Using OpenTelemetry. Does that require runtime > >> dependency > >> > > > > > > on OpenTelemetry library? How good is the compatibility > >> story > >> > > > > > > of OpenTelemetry? This is important since an application > >> could > >> > have > >> > > > > other > >> > > > > > > OpenTelemetry dependencies than the Kafka client. > >> > > > > > > >> > > > > > The current design is that the OpenTelemetry JARs would > >> ship with > >> > the > >> > > > > > client. Perhaps we can design the client such that the > JARs > >> aren't > >> > > even > >> > > > > > loaded if the user has opted out. The user could even > >> exclude the > >> > > JARs > >> > > > > from > >> > > > > > their dependencies if they so wished. > >> > > > > > > >> > > > > > I can't speak to the compatibility of the libraries. Is it > >> possible > >> > > > that > >> > > > > > we include a shaded version? > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Kirk > >> > > > > > > >> > > > > > > > >> > > > > > > 14. The proposal listed idempotence=true. This is more > of > >> a > >> > > > > configuration > >> > > > > > > than a metric. Are we including that as a metric? What > >> other > >> > > > > > configurations > >> > > > > > > are we including? Should we separate the configurations > >> from the > >> > > > > metrics? > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Jun > >> > > > > > > > >> > > > > > > On Mon, Nov 29, 2021 at 7:34 AM Magnus Edenhill < > >> > > mag...@edenhill.se> > >> > > > > > wrote: > >> > > > > > > > >> > > > > > > > Hey Bob, > >> > > > > > > > > >> > > > > > > > That's a good point. > >> > > > > > > > > >> > > > > > > > Request type labels were considered but since they're > >> already > >> > > > tracked > >> > > > > > by > >> > > > > > > > broker-side metrics > >> > > > > > > > they were left out as to avoid metric duplication, > >> however > >> > those > >> > > > > > metrics > >> > > > > > > > are not per connection, > >> > > > > > > > so they won't be that useful in practice for > >> troubleshooting > >> > > > specific > >> > > > > > > > client instances. > >> > > > > > > > > >> > > > > > > > I'll add the request_type label to the relevant > metrics. > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > Magnus > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > Den tis 23 nov. 2021 kl 19:20 skrev Bob Barrett > >> > > > > > > > <bob.barr...@confluent.io.invalid>: > >> > > > > > > > > >> > > > > > > > > Hi Magnus, > >> > > > > > > > > > >> > > > > > > > > Thanks for the thorough KIP, this seems very useful. > >> > > > > > > > > > >> > > > > > > > > Would it make sense to include the request type as a > >> label > >> > for > >> > > > the > >> > > > > > > > > `client.request.success`, `client.request.errors` > and > >> > > > > > > > `client.request.rtt` > >> > > > > > > > > metrics? I think it would be very useful to see > which > >> > specific > >> > > > > > requests > >> > > > > > > > are > >> > > > > > > > > succeeding and failing for a client. One specific > >> case I can > >> > > > think > >> > > > > of > >> > > > > > > > where > >> > > > > > > > > this could be useful is producer batch timeouts. If > a > >> Java > >> > > > > > application > >> > > > > > > > does > >> > > > > > > > > not enable producer client logs (unfortunately, in > my > >> > > experience > >> > > > > this > >> > > > > > > > > happens more often than it should), the application > >> logs will > >> > > > only > >> > > > > > > > contain > >> > > > > > > > > the expiration error message, but no information > >> about what > >> > is > >> > > > > > causing > >> > > > > > > > the > >> > > > > > > > > timeout. The requests might all be succeeding but > >> taking too > >> > > long > >> > > > > to > >> > > > > > > > > process batches, or metadata requests might be > >> failing, or > >> > some > >> > > > or > >> > > > > > all > >> > > > > > > > > produce requests might be failing (if the bootstrap > >> servers > >> > are > >> > > > > > reachable > >> > > > > > > > > from the client but one or more other brokers are > >> not, for > >> > > > > example). > >> > > > > > If > >> > > > > > > > the > >> > > > > > > > > cluster operator is able to identify the specific > >> requests > >> > that > >> > > > are > >> > > > > > slow > >> > > > > > > > or > >> > > > > > > > > failing for a client, they will be better able to > >> diagnose > >> > the > >> > > > > issue > >> > > > > > > > > causing batch timeouts. > >> > > > > > > > > > >> > > > > > > > > One drawback I can think of is that this will > >> increase the > >> > > > > > cardinality of > >> > > > > > > > > the request metrics. But any given client is only > >> going to > >> > use > >> > > a > >> > > > > > small > >> > > > > > > > > subset of the request types, and since we already > have > >> > > partition > >> > > > > > labels > >> > > > > > > > for > >> > > > > > > > > the topic-level metrics, I think request labels will > >> still > >> > make > >> > > > up > >> > > > > a > >> > > > > > > > > relatively small percentage of the set of metrics. > >> > > > > > > > > > >> > > > > > > > > Thanks, > >> > > > > > > > > Bob > >> > > > > > > > > > >> > > > > > > > > On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass > < > >> > > > > > > > > viktorsomo...@gmail.com> > >> > > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Hi Magnus, > >> > > > > > > > > > > >> > > > > > > > > > I think this is a very useful addition. We also > >> have a > >> > > similar > >> > > > > (but > >> > > > > > > > much > >> > > > > > > > > > more simplistic) implementation of this. Maybe I > >> missed it > >> > in > >> > > > the > >> > > > > > KIP > >> > > > > > > > but > >> > > > > > > > > > what about adding metrics about the subscription > >> cache > >> > > itself? > >> > > > > > That I > >> > > > > > > > > think > >> > > > > > > > > > would improve its usability and debuggability as > >> we'd be > >> > able > >> > > > to > >> > > > > > see > >> > > > > > > > its > >> > > > > > > > > > performance, hit/miss rates, eviction counts and > >> others. > >> > > > > > > > > > > >> > > > > > > > > > Best, > >> > > > > > > > > > Viktor > >> > > > > > > > > > > >> > > > > > > > > > On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill < > >> > > > > > mag...@edenhill.se> > >> > > > > > > > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > Hi Mickael, > >> > > > > > > > > > > > >> > > > > > > > > > > see inline. > >> > > > > > > > > > > > >> > > > > > > > > > > Den ons 10 nov. 2021 kl 15:21 skrev Mickael > >> Maison < > >> > > > > > > > > > > mickael.mai...@gmail.com > >> > > > > > > > > > > >: > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi Magnus, > >> > > > > > > > > > > > > >> > > > > > > > > > > > I see you've addressed some of the points I > >> raised > >> > above > >> > > > but > >> > > > > > some > >> > > > > > > > (4, > >> > > > > > > > > > > > 5) have not been addressed yet. > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Re 4) How will the user/app know metrics are > >> being sent. > >> > > > > > > > > > > > >> > > > > > > > > > > One possibility is to add a JMX metric (thus for > >> user > >> > > > > > consumption) > >> > > > > > > > for > >> > > > > > > > > > the > >> > > > > > > > > > > number of metric pushes the > >> > > > > > > > > > > client has performed, or perhaps the number of > >> metrics > >> > > > > > subscriptions > >> > > > > > > > > > > currently being collected. > >> > > > > > > > > > > Would that be sufficient? > >> > > > > > > > > > > > >> > > > > > > > > > > Re 5) Metric sizes and rates > >> > > > > > > > > > > > >> > > > > > > > > > > A worst case scenario for a producer that is > >> producing to > >> > > 50 > >> > > > > > unique > >> > > > > > > > > > topics > >> > > > > > > > > > > and emitting all standard metrics yields > >> > > > > > > > > > > a serialized size of around 100KB prior to > >> compression, > >> > > which > >> > > > > > > > > compresses > >> > > > > > > > > > > down to about 20-30% of that depending > >> > > > > > > > > > > on compression type and topic name uniqueness. > >> > > > > > > > > > > The numbers for a consumer would be similar. > >> > > > > > > > > > > > >> > > > > > > > > > > In practice the number of unique topics would be > >> far > >> > less, > >> > > > and > >> > > > > > the > >> > > > > > > > > > > subscription set would typically be for a subset > >> of > >> > > metrics. > >> > > > > > > > > > > So we're probably closer to 1kb, or less, > >> compressed size > >> > > per > >> > > > > > client > >> > > > > > > > > per > >> > > > > > > > > > > push interval. > >> > > > > > > > > > > > >> > > > > > > > > > > As both the subscription set and push intervals > >> are > >> > > > controlled > >> > > > > > by the > >> > > > > > > > > > > cluster operator it shouldn't be too hard > >> > > > > > > > > > > to strike a good balance between metrics > overhead > >> and > >> > > > > > granularity. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > I'm really uneasy with this being enabled by > >> default on > >> > > the > >> > > > > > client > >> > > > > > > > > > > > side. When collecting data, I think the best > >> practice > >> > is > >> > > to > >> > > > > > ensure > >> > > > > > > > > > > > users are explicitly enabling it. > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Requiring metrics to be explicitly enabled on > >> clients > >> > > > severely > >> > > > > > > > cripples > >> > > > > > > > > > its > >> > > > > > > > > > > usability and value. > >> > > > > > > > > > > > >> > > > > > > > > > > One of the problems that this KIP aims to solve > >> is for > >> > > useful > >> > > > > > metrics > >> > > > > > > > > to > >> > > > > > > > > > be > >> > > > > > > > > > > available on demand > >> > > > > > > > > > > regardless of the technical expertise of the > >> user. As > >> > > Ryanne > >> > > > > > points, > >> > > > > > > > > out > >> > > > > > > > > > a > >> > > > > > > > > > > savvy user/organization > >> > > > > > > > > > > will typically have metrics collection and > >> monitoring in > >> > > > place > >> > > > > > > > already, > >> > > > > > > > > > and > >> > > > > > > > > > > the benefits of this KIP > >> > > > > > > > > > > are then more of a common set and format metrics > >> across > >> > > > client > >> > > > > > > > > > > implementations and languages. > >> > > > > > > > > > > But that is not the typical Kafka user in my > >> experience, > >> > > > > they're > >> > > > > > not > >> > > > > > > > > > Kafka > >> > > > > > > > > > > experts and they don't have the > >> > > > > > > > > > > knowledge of how to best instrument their > clients. > >> > > > > > > > > > > Having metrics enabled by default for this user > >> base > >> > allows > >> > > > the > >> > > > > > Kafka > >> > > > > > > > > > > operators to proactively and reactively > >> > > > > > > > > > > monitor and troubleshoot client issues, without > >> the need > >> > > for > >> > > > > the > >> > > > > > less > >> > > > > > > > > > savvy > >> > > > > > > > > > > user to do anything. > >> > > > > > > > > > > It is often too late to tell a user to enable > >> metrics > >> > when > >> > > > the > >> > > > > > > > problem > >> > > > > > > > > > has > >> > > > > > > > > > > already occurred. > >> > > > > > > > > > > > >> > > > > > > > > > > Now, to be clear, even though metrics are > enabled > >> by > >> > > default > >> > > > on > >> > > > > > > > clients > >> > > > > > > > > > it > >> > > > > > > > > > > is not enabled by default > >> > > > > > > > > > > on the brokers; the Kafka operator needs to > build > >> and set > >> > > up > >> > > > a > >> > > > > > > > metrics > >> > > > > > > > > > > plugin and add metrics subscriptions > >> > > > > > > > > > > before anything is sent from the client. > >> > > > > > > > > > > It is opt-out on the clients and opt-in on the > >> broker. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > You mentioned brokers already have > >> > > > > > > > > > > > some(most?) of the information contained in > >> metrics, if > >> > > so > >> > > > > > then why > >> > > > > > > > > > > > are we collecting it again? Surely there must > >> be some > >> > new > >> > > > > > > > information > >> > > > > > > > > > > > in the client metrics. > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > From the user's perspective the Kafka > >> infrastructure > >> > > extends > >> > > > > from > >> > > > > > > > > > > producer.send() to > >> > > > > > > > > > > messages being returned from consumer.poll(), a > >> giant > >> > black > >> > > > box > >> > > > > > where > >> > > > > > > > > > > there's a lot going on between those > >> > > > > > > > > > > two points. The brokers currently only see what > >> happens > >> > > once > >> > > > > > those > >> > > > > > > > > > requests > >> > > > > > > > > > > and messages hits the broker, > >> > > > > > > > > > > but as Kafka clients are complex pieces of > >> machinery > >> > > there's > >> > > > a > >> > > > > > myriad > >> > > > > > > > > of > >> > > > > > > > > > > queues, timers, and state > >> > > > > > > > > > > that's critical to the operation and > >> infrastructure > >> > that's > >> > > > not > >> > > > > > > > > currently > >> > > > > > > > > > > visible to the operator. > >> > > > > > > > > > > Relying on the user to accurately and timely > >> provide this > >> > > > > missing > >> > > > > > > > > > > information is not generally feasible. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Most of the standard metrics listed in the KIP > >> are data > >> > > > points > >> > > > > > that > >> > > > > > > > the > >> > > > > > > > > > > broker does not have. > >> > > > > > > > > > > Only a small number of metrics are duplicates > >> (like the > >> > > > request > >> > > > > > > > counts > >> > > > > > > > > > and > >> > > > > > > > > > > sizes), but they are included > >> > > > > > > > > > > to ease correlation when inspecting these client > >> metrics. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Moreover this is a brand new feature so it's > >> even > >> > harder > >> > > to > >> > > > > > justify > >> > > > > > > > > > > > enabling it and forcing onto all our users. If > >> disabled > >> > > by > >> > > > > > default, > >> > > > > > > > > > > > it's relatively easy to enable in a new > release > >> if we > >> > > > decide > >> > > > > > to, > >> > > > > > > > but > >> > > > > > > > > > > > once enabled by default it's much harder to > >> disable. > >> > Also > >> > > > > this > >> > > > > > > > > feature > >> > > > > > > > > > > > will apply to all future metrics we will add. > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > I think maturity of a feature implementation > >> should be > >> > the > >> > > > > > deciding > >> > > > > > > > > > factor, > >> > > > > > > > > > > rather than > >> > > > > > > > > > > the design of it (which this KIP is). I.e., if > the > >> > > > > > implementation is > >> > > > > > > > > not > >> > > > > > > > > > > deemed mature enough > >> > > > > > > > > > > for release X.Y it will be disabled. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Overall I think it's an interesting feature > but > >> I'd > >> > > prefer > >> > > > to > >> > > > > > be > >> > > > > > > > > > > > slightly defensive and see how it works in > >> practice > >> > > before > >> > > > > > enabling > >> > > > > > > > > it > >> > > > > > > > > > > > everywhere. > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Right, and I agree on being defensive, but since > >> this > >> > > feature > >> > > > > > still > >> > > > > > > > > > > requires manual > >> > > > > > > > > > > enabling on the brokers before actually being > >> used, I > >> > think > >> > > > > that > >> > > > > > > > gives > >> > > > > > > > > > > enough control > >> > > > > > > > > > > to opt-in or out of this feature as needed. > >> > > > > > > > > > > > >> > > > > > > > > > > Thanks for your comments! > >> > > > > > > > > > > > >> > > > > > > > > > > Regards, > >> > > > > > > > > > > Magnus > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks, > >> > > > > > > > > > > > Mickael > >> > > > > > > > > > > > > >> > > > > > > > > > > > On Mon, Nov 8, 2021 at 7:55 AM Magnus > Edenhill < > >> > > > > > mag...@edenhill.se > >> > > > > > > > > > >> > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Thanks David for pointing this out, > >> > > > > > > > > > > > > I've updated the KIP to include client_id > as a > >> > matching > >> > > > > > selector. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Regards, > >> > > > > > > > > > > > > Magnus > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Den tors 4 nov. 2021 kl 18:01 skrev David > Mao > >> > > > > > > > > > > <d...@confluent.io.invalid > >> > > > > > > > > > > > >: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hey Magnus, > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > I noticed that the KIP outlines the > initial > >> > selectors > >> > > > > > supported > >> > > > > > > > > as: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > - client_instance_id - > >> CLIENT_INSTANCE_ID UUID > >> > > > string > >> > > > > > > > > > > > representation. > >> > > > > > > > > > > > > > - client_software_name - client > software > >> > > > > implementation > >> > > > > > > > name. > >> > > > > > > > > > > > > > - client_software_version - client > >> software > >> > > > > > implementation > >> > > > > > > > > > > version. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > In the given reactive monitoring workflow, > >> we > >> > mention > >> > > > > that > >> > > > > > the > >> > > > > > > > > > > > application > >> > > > > > > > > > > > > > user does not know their client's client > >> instance > >> > ID, > >> > > > but > >> > > > > > it's > >> > > > > > > > > > > outlined > >> > > > > > > > > > > > > > that the operator can add a metrics > >> subscription > >> > > > > selecting > >> > > > > > for > >> > > > > > > > > > > > clientId. I > >> > > > > > > > > > > > > > don't see clientId as one of the supported > >> > selectors. > >> > > > > > > > > > > > > > I can see how this would have made sense > in > >> a > >> > > previous > >> > > > > > > > iteration > >> > > > > > > > > > > given > >> > > > > > > > > > > > that > >> > > > > > > > > > > > > > the previous client instance ID proposal > >> was to > >> > > > construct > >> > > > > > the > >> > > > > > > > > > client > >> > > > > > > > > > > > > > instance ID using clientId as a prefix. > Now > >> that > >> > the > >> > > > > client > >> > > > > > > > > > instance > >> > > > > > > > > > > > ID is > >> > > > > > > > > > > > > > a UUID, would we want to add clientId as a > >> > supported > >> > > > > > selector? > >> > > > > > > > > > > > > > Let me know what you think. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > David > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 12:33 PM Magnus > >> Edenhill < > >> > > > > > > > > > mag...@edenhill.se > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi Mickael! > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Den tis 19 okt. 2021 kl 19:30 skrev > >> Mickael > >> > Maison > >> > > < > >> > > > > > > > > > > > > > > mickael.mai...@gmail.com > >> > > > > > > > > > > > > > > >: > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Magnus, > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks for the proposal. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 1. Looking at the protocol section, > >> isn't > >> > > > > > > > "ClientInstanceId" > >> > > > > > > > > > > > expected > >> > > > > > > > > > > > > > > > to be a field in > >> > > > GetTelemetrySubscriptionsResponseV0? > >> > > > > > > > > > Otherwise, > >> > > > > > > > > > > > how > >> > > > > > > > > > > > > > > > does a client retrieve this value? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Good catch, it got removed by mistake in > >> one of > >> > the > >> > > > > > edits. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 2. In the client API section, you > >> mention a new > >> > > > > method > >> > > > > > > > > > > > > > > > "clientInstanceId()". Can you clarify > >> which > >> > > > > interfaces > >> > > > > > are > >> > > > > > > > > > > > affected? > >> > > > > > > > > > > > > > > > Is it only Consumer and Producer? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > And Admin. Will update the KIP. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 3. I'm a bit concerned this is enabled > >> by > >> > > default. > >> > > > > > Even if > >> > > > > > > > > the > >> > > > > > > > > > > data > >> > > > > > > > > > > > > > > > collected is supposed to be not > >> sensitive, I > >> > > think > >> > > > > > this can > >> > > > > > > > > be > >> > > > > > > > > > > > > > > > problematic in some environments. Also > >> users > >> > > don't > >> > > > > > seem to > >> > > > > > > > > have > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > > > choice to only expose some metrics. > >> Knowing how > >> > > > much > >> > > > > > data > >> > > > > > > > > > transit > >> > > > > > > > > > > > > > > > through some applications can be > >> considered > >> > > > critical. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > The broker already knows how much data > >> transits > >> > > > through > >> > > > > > the > >> > > > > > > > > > client > >> > > > > > > > > > > > > > though, > >> > > > > > > > > > > > > > > right? > >> > > > > > > > > > > > > > > Care has been taken not to expose > >> information in > >> > > the > >> > > > > > standard > >> > > > > > > > > > > metrics > >> > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > might > >> > > > > > > > > > > > > > > reveal sensitive information. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Do you have an example of how the > proposed > >> > metrics > >> > > > > could > >> > > > > > leak > >> > > > > > > > > > > > sensitive > >> > > > > > > > > > > > > > > information? > >> > > > > > > > > > > > > > > As for limiting the what metrics to > >> export; I > >> > guess > >> > > > > that > >> > > > > > > > could > >> > > > > > > > > > make > >> > > > > > > > > > > > sense > >> > > > > > > > > > > > > > > in some > >> > > > > > > > > > > > > > > very sensitive use-cases, but those > users > >> might > >> > > > disable > >> > > > > > > > metrics > >> > > > > > > > > > > > > > altogether > >> > > > > > > > > > > > > > > for now. > >> > > > > > > > > > > > > > > Could these concerns be addressed by a > >> later KIP? > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 4. As a user, how do you know if your > >> > application > >> > > > is > >> > > > > > > > actively > >> > > > > > > > > > > > sending > >> > > > > > > > > > > > > > > > metrics? Are there new metrics > exposing > >> what's > >> > > > going > >> > > > > > on, > >> > > > > > > > like > >> > > > > > > > > > how > >> > > > > > > > > > > > much > >> > > > > > > > > > > > > > > > data is being sent? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > That's a good question. > >> > > > > > > > > > > > > > > Since the proposed metrics interface is > >> not aimed > >> > > at, > >> > > > > or > >> > > > > > > > > directly > >> > > > > > > > > > > > > > available > >> > > > > > > > > > > > > > > to, the application > >> > > > > > > > > > > > > > > I guess there's little point of adding > it > >> here, > >> > but > >> > > > > > instead > >> > > > > > > > > > adding > >> > > > > > > > > > > > > > > something to the > >> > > > > > > > > > > > > > > existing JMX metrics? > >> > > > > > > > > > > > > > > Do you have any suggestions? > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 5. If all metrics are enabled on a > >> regular > >> > > Consumer > >> > > > > or > >> > > > > > > > > > Producer, > >> > > > > > > > > > > do > >> > > > > > > > > > > > > > > > you have an idea how much throughput > >> this would > >> > > > use? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > It depends on the number of > >> partition/topics/etc > >> > > the > >> > > > > > client > >> > > > > > > > is > >> > > > > > > > > > > > producing > >> > > > > > > > > > > > > > > to/consuming from. > >> > > > > > > > > > > > > > > I'll add some sizes to the KIP for some > >> typical > >> > > > > > use-cases. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Thanks, > >> > > > > > > > > > > > > > > Magnus > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 5:06 PM Magnus > >> > Edenhill < > >> > > > > > > > > > > > mag...@edenhill.se> > >> > > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Den tis 19 okt. 2021 kl 13:22 skrev > >> Tom > >> > > Bentley < > >> > > > > > > > > > > > tbent...@redhat.com > >> > > > > > > > > > > > > > >: > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Hi Magnus, > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > I reviewed the KIP since you > called > >> the > >> > vote > >> > > > > > (sorry for > >> > > > > > > > > not > >> > > > > > > > > > > > > > reviewing > >> > > > > > > > > > > > > > > > when > >> > > > > > > > > > > > > > > > > > you announced your intention to > >> call the > >> > > > vote). I > >> > > > > > have > >> > > > > > > > a > >> > > > > > > > > > few > >> > > > > > > > > > > > > > > questions > >> > > > > > > > > > > > > > > > on > >> > > > > > > > > > > > > > > > > > some of the details. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 1. There's no Javadoc on > >> > > > > > ClientTelemetryPayload.data(), > >> > > > > > > > > so > >> > > > > > > > > > I > >> > > > > > > > > > > > don't > >> > > > > > > > > > > > > > > know > >> > > > > > > > > > > > > > > > > > whether the payload is exposed > >> through this > >> > > > > method > >> > > > > > as > >> > > > > > > > > > > > compressed or > >> > > > > > > > > > > > > > > > not. > >> > > > > > > > > > > > > > > > > > Later on you say "Decompression of > >> the > >> > > payloads > >> > > > > > will be > >> > > > > > > > > > > > handled by > >> > > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > > broker metrics plugin, the broker > >> should > >> > > > expose a > >> > > > > > > > > suitable > >> > > > > > > > > > > > > > > > decompression > >> > > > > > > > > > > > > > > > > > API to the metrics plugin for this > >> > purpose.", > >> > > > > which > >> > > > > > > > > > suggests > >> > > > > > > > > > > > it's > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > > compressed data in the buffer, but > >> then we > >> > > > don't > >> > > > > > know > >> > > > > > > > > which > >> > > > > > > > > > > > codec > >> > > > > > > > > > > > > > was > >> > > > > > > > > > > > > > > > used, > >> > > > > > > > > > > > > > > > > > nor the API via which the plugin > >> should > >> > > > > decompress > >> > > > > > it > >> > > > > > > > if > >> > > > > > > > > > > > required > >> > > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > > > forwarding to the ultimate metrics > >> store. > >> > > > Should > >> > > > > > the > >> > > > > > > > > > > > > > > > ClientTelemetryPayload > >> > > > > > > > > > > > > > > > > > expose a method to get the > >> compression and > >> > a > >> > > > > > > > > decompressor? > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Good point, updated. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 2. The client-side API is > expressed > >> as > >> > > > > > StringOrError > >> > > > > > > > > > > > > > > > > > > ClientInstance::ClientInstanceId(int > >> > > > > timeout_ms). I > >> > > > > > > > > > > understand > >> > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > you're > >> > > > > > > > > > > > > > > > > > thinking about the librdkafka > >> > implementation, > >> > > > but > >> > > > > > it > >> > > > > > > > > would > >> > > > > > > > > > be > >> > > > > > > > > > > > good > >> > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > show > >> > > > > > > > > > > > > > > > > > the API as it would appear on the > >> Apache > >> > > Kafka > >> > > > > > clients. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > This was meant as pseudo-code, but I > >> changed > >> > it > >> > > > to > >> > > > > > Java. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 3. "PushTelemetryRequest|Response > - > >> > protocol > >> > > > > > request > >> > > > > > > > used > >> > > > > > > > > > by > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > client to > >> > > > > > > > > > > > > > > > > > send metrics to any broker it is > >> connected > >> > > to." > >> > > > > To > >> > > > > > be > >> > > > > > > > > > clear, > >> > > > > > > > > > > > this > >> > > > > > > > > > > > > > > means > >> > > > > > > > > > > > > > > > > > that the client can choose any of > >> the > >> > > connected > >> > > > > > brokers > >> > > > > > > > > and > >> > > > > > > > > > > > push to > >> > > > > > > > > > > > > > > > just > >> > > > > > > > > > > > > > > > > > one of them? What should a > >> supporting > >> > client > >> > > do > >> > > > > if > >> > > > > > it > >> > > > > > > > > gets > >> > > > > > > > > > an > >> > > > > > > > > > > > error > >> > > > > > > > > > > > > > > > when > >> > > > > > > > > > > > > > > > > > pushing metrics to a broker, retry > >> sending > >> > to > >> > > > the > >> > > > > > same > >> > > > > > > > > > broker > >> > > > > > > > > > > > or > >> > > > > > > > > > > > > > try > >> > > > > > > > > > > > > > > > > > pushing to another broker, or drop > >> the > >> > > metrics? > >> > > > > > Should > >> > > > > > > > > > > > supporting > >> > > > > > > > > > > > > > > > clients > >> > > > > > > > > > > > > > > > > > send successive requests to a > single > >> > broker, > >> > > or > >> > > > > > round > >> > > > > > > > > > robin, > >> > > > > > > > > > > > or is > >> > > > > > > > > > > > > > > > that up > >> > > > > > > > > > > > > > > > > > to the client author? I'm guessing > >> the > >> > > > behaviour > >> > > > > > should > >> > > > > > > > > be > >> > > > > > > > > > > > sticky > >> > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > > support the rate limiting > features, > >> but I > >> > > think > >> > > > > it > >> > > > > > > > would > >> > > > > > > > > be > >> > > > > > > > > > > > good > >> > > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > client > >> > > > > > > > > > > > > > > > > > authors if this section were > >> explicit on > >> > the > >> > > > > > > > recommended > >> > > > > > > > > > > > behaviour. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > You are right, I've updated the KIP > >> to make > >> > > this > >> > > > > > clearer. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 4. "Mapping the client instance id > >> to an > >> > > actual > >> > > > > > > > > application > >> > > > > > > > > > > > > > instance > >> > > > > > > > > > > > > > > > > > running on a (virtual) machine can > >> be done > >> > by > >> > > > > > > > inspecting > >> > > > > > > > > > the > >> > > > > > > > > > > > > > metrics > >> > > > > > > > > > > > > > > > > > resource labels, such as the > client > >> source > >> > > > > address > >> > > > > > and > >> > > > > > > > > > source > >> > > > > > > > > > > > port, > >> > > > > > > > > > > > > > > or > >> > > > > > > > > > > > > > > > > > security principal, all of which > >> are added > >> > by > >> > > > the > >> > > > > > > > > receiving > >> > > > > > > > > > > > broker. > >> > > > > > > > > > > > > > > > This > >> > > > > > > > > > > > > > > > > > will allow the operator together > >> with the > >> > > user > >> > > > to > >> > > > > > > > > identify > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > actual > >> > > > > > > > > > > > > > > > > > application instance." Is this > >> really > >> > always > >> > > > > true? > >> > > > > > The > >> > > > > > > > > > source > >> > > > > > > > > > > > IP > >> > > > > > > > > > > > > > and > >> > > > > > > > > > > > > > > > port > >> > > > > > > > > > > > > > > > > > might be a loadbalancer/proxy in > >> some > >> > setups. > >> > > > The > >> > > > > > > > > > principal, > >> > > > > > > > > > > as > >> > > > > > > > > > > > > > > already > >> > > > > > > > > > > > > > > > > > mentioned in the KIP, might be > >> shared > >> > between > >> > > > > > multiple > >> > > > > > > > > > > > > > applications. > >> > > > > > > > > > > > > > > > So at > >> > > > > > > > > > > > > > > > > > worst the organization running the > >> clients > >> > > > might > >> > > > > > have > >> > > > > > > > to > >> > > > > > > > > > > > consult > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > logs > >> > > > > > > > > > > > > > > > > > of a set of client applications, > >> right? > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Yes, that's correct. There's no > >> guaranteed > >> > > > mapping > >> > > > > > from > >> > > > > > > > > > > > > > > > client_instance_id > >> > > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > an actual instance, that's why the > KIP > >> > > recommends > >> > > > > > client > >> > > > > > > > > > > > > > > implementations > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > log the client instance id > >> > > > > > > > > > > > > > > > > upon retrieval, and also provide an > >> API for > >> > the > >> > > > > > > > application > >> > > > > > > > > > to > >> > > > > > > > > > > > > > retrieve > >> > > > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > instance id programmatically > >> > > > > > > > > > > > > > > > > if it has a better way of exposing > it. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 5. "Tests indicate that a > compression > >> ratio > >> > up > >> > > to > >> > > > > > 10x is > >> > > > > > > > > > > > possible for > >> > > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > > standard metrics." Client authors > >> might > >> > > > > appreciate > >> > > > > > your > >> > > > > > > > > > > > mentioning > >> > > > > > > > > > > > > > > > which > >> > > > > > > > > > > > > > > > > > compression codec got these > results. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Good point. Updated. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 6. "Should the client send a push > >> request > >> > > prior > >> > > > > to > >> > > > > > > > expiry > >> > > > > > > > > > of > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > previously > >> > > > > > > > > > > > > > > > > > calculated PushIntervalMs the > >> broker will > >> > > > discard > >> > > > > > the > >> > > > > > > > > > metrics > >> > > > > > > > > > > > and > >> > > > > > > > > > > > > > > > return a > >> > > > > > > > > > > > > > > > > > PushTelemetryResponse with the > >> ErrorCode > >> > set > >> > > to > >> > > > > > > > > > RateLimited." > >> > > > > > > > > > > > Is > >> > > > > > > > > > > > > > this > >> > > > > > > > > > > > > > > > > > RATE_LIMITED a new error code? > It's > >> not > >> > > > mentioned > >> > > > > > in > >> > > > > > > > the > >> > > > > > > > > > "New > >> > > > > > > > > > > > Error > >> > > > > > > > > > > > > > > > Codes" > >> > > > > > > > > > > > > > > > > > section. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > That's a leftover, it should be > using > >> the > >> > > > standard > >> > > > > > > > > > ThrottleTime > >> > > > > > > > > > > > > > > > mechanism. > >> > > > > > > > > > > > > > > > > Fixed. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 7. In the section "Standard client > >> resource > >> > > > > labels" > >> > > > > > > > > > > > application_id > >> > > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > > > > described as Kafka Streams only, > >> but the > >> > > > section > >> > > > > of > >> > > > > > > > > "Client > >> > > > > > > > > > > > > > > > Identification" > >> > > > > > > > > > > > > > > > > > talks about "application instance > >> id as an > >> > > > > optional > >> > > > > > > > > future > >> > > > > > > > > > > > > > > nice-to-have > >> > > > > > > > > > > > > > > > > > that may be included as a metrics > >> label if > >> > it > >> > > > has > >> > > > > > been > >> > > > > > > > > set > >> > > > > > > > > > by > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > user", so > >> > > > > > > > > > > > > > > > > > I'm confused whether non-Kafka > >> Streams > >> > > clients > >> > > > > > should > >> > > > > > > > set > >> > > > > > > > > > an > >> > > > > > > > > > > > > > > > application_id > >> > > > > > > > > > > > > > > > > > or not. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > I'll clarify this in the KIP, but > >> basically > >> > we > >> > > > > would > >> > > > > > need > >> > > > > > > > > to > >> > > > > > > > > > > add > >> > > > > > > > > > > > an ` > >> > > > > > > > > > > > > > > > > application.id` config > >> > > > > > > > > > > > > > > > > property for non-streams clients for > >> this > >> > > > purpose, > >> > > > > > and > >> > > > > > > > > that's > >> > > > > > > > > > > > outside > >> > > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > scope of this KIP since we want to > >> make it > >> > > > > > zero-conf:ish > >> > > > > > > > on > >> > > > > > > > > > the > >> > > > > > > > > > > > > > client > >> > > > > > > > > > > > > > > > side. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Kind regards, > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Tom > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thanks for the review, > >> > > > > > > > > > > > > > > > > Magnus > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Thu, Oct 7, 2021 at 5:26 PM > >> Magnus > >> > > Edenhill > >> > > > < > >> > > > > > > > > > > > mag...@edenhill.se > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Hi all, > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > I've updated the KIP following > >> our recent > >> > > > > > discussions > >> > > > > > > > > on > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > > mailing > >> > > > > > > > > > > > > > > > > > list: > >> > > > > > > > > > > > > > > > > > > - split the protocol in two, > one > >> for > >> > > getting > >> > > > > the > >> > > > > > > > > metrics > >> > > > > > > > > > > > > > > > subscriptions, > >> > > > > > > > > > > > > > > > > > > and one for pushing the metrics. > >> > > > > > > > > > > > > > > > > > > - simplifications: initially > >> only one > >> > > > > supported > >> > > > > > > > > metrics > >> > > > > > > > > > > > format, > >> > > > > > > > > > > > > > no > >> > > > > > > > > > > > > > > > > > > client.id in the instance id, > >> etc. > >> > > > > > > > > > > > > > > > > > > - made CLIENT_METRICS > >> subscription > >> > > > > configuration > >> > > > > > > > > entries > >> > > > > > > > > > > > more > >> > > > > > > > > > > > > > > > structured > >> > > > > > > > > > > > > > > > > > > and allowing better client > >> matching > >> > > > > selectors > >> > > > > > (not > >> > > > > > > > > > only > >> > > > > > > > > > > > on the > >> > > > > > > > > > > > > > > > > > instance > >> > > > > > > > > > > > > > > > > > > id, but also the other > >> > > > > > > > > > > > > > > > > > > client resource labels, such > as > >> > > > > > > > > client_software_name, > >> > > > > > > > > > > > etc.). > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Unless there are further > comments > >> I'll > >> > call > >> > > > the > >> > > > > > vote > >> > > > > > > > > in a > >> > > > > > > > > > > > day or > >> > > > > > > > > > > > > > > two. > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Regards, > >> > > > > > > > > > > > > > > > > > > Magnus > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Den mån 4 okt. 2021 kl 20:57 > >> skrev Magnus > >> > > > > > Edenhill < > >> > > > > > > > > > > > > > > > mag...@edenhill.se>: > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Hi Gwen, > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > I'm finishing up the KIP based > >> on the > >> > > last > >> > > > > > couple > >> > > > > > > > of > >> > > > > > > > > > > > discussion > >> > > > > > > > > > > > > > > > points > >> > > > > > > > > > > > > > > > > > in > >> > > > > > > > > > > > > > > > > > > > this thread > >> > > > > > > > > > > > > > > > > > > > and will call the Vote later > >> this week. > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Best, > >> > > > > > > > > > > > > > > > > > > > Magnus > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Den lör 2 okt. 2021 kl 02:01 > >> skrev Gwen > >> > > > > Shapira > >> > > > > > > > > > > > > > > > > > > <g...@confluent.io.invalid > >> > > > > > > > > > > > > > > > > > > > >: > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> Hey, > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> I noticed that there was no > >> discussion > >> > > for > >> > > > > the > >> > > > > > > > last > >> > > > > > > > > 10 > >> > > > > > > > > > > > days, > >> > > > > > > > > > > > > > > but I > >> > > > > > > > > > > > > > > > > > > >> couldn't > >> > > > > > > > > > > > > > > > > > > >> find the vote thread. Is > there > >> one > >> > that > >> > > > I'm > >> > > > > > > > missing? > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> Gwen > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> On Wed, Sep 22, 2021 at 4:58 > >> AM Magnus > >> > > > > > Edenhill < > >> > > > > > > > > > > > > > > > mag...@edenhill.se> > >> > > > > > > > > > > > > > > > > > > >> wrote: > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > Den tis 21 sep. 2021 kl > >> 06:58 skrev > >> > > > Colin > >> > > > > > > > McCabe < > >> > > > > > > > > > > > > > > > > > cmcc...@apache.org > >> > > > > > > > > > > > > > > > > > > >: > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > On Mon, Sep 20, 2021, at > >> 17:35, > >> > Feng > >> > > > Min > >> > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > > >> > > > Thanks Magnus & Colin > >> for the > >> > > > > > discussion. > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > Based on KIP-714's > >> stateless > >> > > design, > >> > > > > > Client > >> > > > > > > > > can > >> > > > > > > > > > > > pretty > >> > > > > > > > > > > > > > > much > >> > > > > > > > > > > > > > > > use > >> > > > > > > > > > > > > > > > > > > any > >> > > > > > > > > > > > > > > > > > > >> > > > connection to any > broker > >> to send > >> > > > > > metrics. We > >> > > > > > > > > are > >> > > > > > > > > > > not > >> > > > > > > > > > > > > > > > associating > >> > > > > > > > > > > > > > > > > > > >> > > connection > >> > > > > > > > > > > > > > > > > > > >> > > > with client metric > >> state. Is my > >> > > > > > > > understanding > >> > > > > > > > > > > > correct? > >> > > > > > > > > > > > > > If > >> > > > > > > > > > > > > > > > yes, > >> > > > > > > > > > > > > > > > > > > how > >> > > > > > > > > > > > > > > > > > > >> > about > >> > > > > > > > > > > > > > > > > > > >> > > > the following two > >> scenarios > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > 1) One Client > (Client-ID) > >> > > registers > >> > > > > two > >> > > > > > > > > > different > >> > > > > > > > > > > > client > >> > > > > > > > > > > > > > > > > > instance > >> > > > > > > > > > > > > > > > > > > id > >> > > > > > > > > > > > > > > > > > > >> > via > >> > > > > > > > > > > > > > > > > > > >> > > > separate registration. > >> Is it > >> > > > > permitted? > >> > > > > > If > >> > > > > > > > OK, > >> > > > > > > > > > how > >> > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > > distinguish > >> > > > > > > > > > > > > > > > > > > >> them > >> > > > > > > > > > > > > > > > > > > >> > > from > >> > > > > > > > > > > > > > > > > > > >> > > > the case 2 below. > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > Hi Feng, > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > My understanding, which > >> Magnus can > >> > > > > > clarify I > >> > > > > > > > > > guess, > >> > > > > > > > > > > is > >> > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > you > >> > > > > > > > > > > > > > > > > > > could > >> > > > > > > > > > > > > > > > > > > >> > have > >> > > > > > > > > > > > > > > > > > > >> > > something like two > Producer > >> > > instances > >> > > > > > running > >> > > > > > > > > with > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > same > >> > > > > > > > > > > > > > > > > > > client.id > >> > > > > > > > > > > > > > > > > > > >> > > (perhaps because they're > >> using the > >> > > > same > >> > > > > > config > >> > > > > > > > > > file, > >> > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > example). > >> > > > > > > > > > > > > > > > > > > >> They > >> > > > > > > > > > > > > > > > > > > >> > > could even be in the same > >> process. > >> > > But > >> > > > > > they > >> > > > > > > > > would > >> > > > > > > > > > > get > >> > > > > > > > > > > > > > > separate > >> > > > > > > > > > > > > > > > > > > UUIDs. > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > I believe Magnus used the > >> term > >> > > client > >> > > > to > >> > > > > > mean > >> > > > > > > > > > > > "Producer or > >> > > > > > > > > > > > > > > > > > > Consumer". > >> > > > > > > > > > > > > > > > > > > >> So > >> > > > > > > > > > > > > > > > > > > >> > > if you have both a > >> Producer and a > >> > > > > > Consumer in > >> > > > > > > > > your > >> > > > > > > > > > > > > > > > application I > >> > > > > > > > > > > > > > > > > > > would > >> > > > > > > > > > > > > > > > > > > >> > > expect you'd get separate > >> UUIDs > >> > for > >> > > > > both. > >> > > > > > > > Again > >> > > > > > > > > > > > Magnus can > >> > > > > > > > > > > > > > > > chime > >> > > > > > > > > > > > > > > > > > in > >> > > > > > > > > > > > > > > > > > > >> > here, I > >> > > > > > > > > > > > > > > > > > > >> > > guess. > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > That's correct. > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > 2) How about the client > >> > > restarting? > >> > > > > > What's > >> > > > > > > > the > >> > > > > > > > > > > > > > > expectation? > >> > > > > > > > > > > > > > > > > > Should > >> > > > > > > > > > > > > > > > > > > >> the > >> > > > > > > > > > > > > > > > > > > >> > > > server expect the > client > >> to > >> > carry > >> > > a > >> > > > > > > > persisted > >> > > > > > > > > > > client > >> > > > > > > > > > > > > > > > instance id > >> > > > > > > > > > > > > > > > > > > or > >> > > > > > > > > > > > > > > > > > > >> > > should > >> > > > > > > > > > > > > > > > > > > >> > > > the client be treated > as > >> a new > >> > > > > instance? > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > The KIP doesn't describe > >> any > >> > > mechanism > >> > > > > for > >> > > > > > > > > > > > persistence, > >> > > > > > > > > > > > > > so I > >> > > > > > > > > > > > > > > > would > >> > > > > > > > > > > > > > > > > > > >> assume > >> > > > > > > > > > > > > > > > > > > >> > > that when you restart the > >> client > >> > you > >> > > > get > >> > > > > > a new > >> > > > > > > > > > > UUID. I > >> > > > > > > > > > > > > > agree > >> > > > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > > > it > >> > > > > > > > > > > > > > > > > > > >> > would > >> > > > > > > > > > > > > > > > > > > >> > > be good to spell this > out. > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > Right, it will not be > >> persisted > >> > since > >> > > a > >> > > > > > client > >> > > > > > > > > > > instance > >> > > > > > > > > > > > > > can't > >> > > > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > > > > >> restarted. > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > Will update the KIP to make > >> this > >> > > > clearer. > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > /Magnus > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> -- > >> > > > > > > > > > > > > > > > > > > >> Gwen Shapira > >> > > > > > > > > > > > > > > > > > > >> Engineering Manager | > Confluent > >> > > > > > > > > > > > > > > > > > > >> 650.450.2760 | @gwenshap > >> > > > > > > > > > > > > > > > > > > >> Follow us: Twitter | blog > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >