[ https://issues.apache.org/jira/browse/KAFKA-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451356#comment-16451356 ]
ASF GitHub Bot commented on KAFKA-6376: --------------------------------------- guozhangwang closed pull request #4922: KAFKA-6376: Document skipped records metrics changes URL: https://github.com/apache/kafka/pull/4922 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/ops.html b/docs/ops.html index 6ffe97653e6..450a268a2a1 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -1353,7 +1353,12 @@ <h5><a id="kafka_streams_thread_monitoring" href="#kafka_streams_thread_monitori </tr> <tr> <td>skipped-records-rate</td> - <td>The average number of skipped records per second. </td> + <td>The average number of skipped records per second.</td> + <td>kafka.streams:type=stream-metrics,client-id=([-.\w]+)</td> + </tr> + <tr> + <td>skipped-records-total</td> + <td>The total number of skipped records.</td> <td>kafka.streams:type=stream-metrics,client-id=([-.\w]+)</td> </tr> </tbody> diff --git a/docs/streams/upgrade-guide.html b/docs/streams/upgrade-guide.html index 7ffafb547a8..565bd0b263c 100644 --- a/docs/streams/upgrade-guide.html +++ b/docs/streams/upgrade-guide.html @@ -101,6 +101,37 @@ <h1>Upgrade Guide and API Changes</h1> <!-- TODO: verify release verion and update `id` and `href` attributes (also at other places that link to this headline) --> <h3><a id="streams_api_changes_120" href="#streams_api_changes_120">Streams API changes in 1.2.0</a></h3> + <p> + We have removed the <code>skippedDueToDeserializationError-rate</code> and <code>skippedDueToDeserializationError-total</code> metrics. + Deserialization errors, and all other causes of record skipping, are now accounted for in the pre-existing metrics + <code>skipped-records-rate</code> and <code>skipped-records-total</code>. When a record is skipped, the event is + now logged at WARN level. If these warnings become burdensome, we recommend explicitly filtering out unprocessable + records instead of depending on record skipping semantics. For more details, see + <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-274%3A+Kafka+Streams+Skipped+Records+Metrics">KIP-274</a>. + As of right now, the potential causes of skipped records are: + </p> + <ul> + <li><code>null</code> keys in table sources</li> + <li><code>null</code> keys in table-table inner/left/outer/right joins</li> + <li><code>null</code> keys or values in stream-table joins</li> + <li><code>null</code> keys or values in stream-stream joins</li> + <li><code>null</code> keys or values in aggregations on grouped streams</li> + <li><code>null</code> keys or values in reductions on grouped streams</li> + <li><code>null</code> keys in aggregations on windowed streams</li> + <li><code>null</code> keys in reductions on windowed streams</li> + <li><code>null</code> keys in aggregations on session-windowed streams</li> + <li> + Errors producing results, when the configured <code>default.production.exception.handler</code> decides to + <code>CONTINUE</code> (the default is to <code>FAIL</code> and throw an exception). + </li> + <li> + Errors deserializing records, when the configured <code>default.deserialization.exception.handler</code> + decides to <code>CONTINUE</code> (the default is to <code>FAIL</code> and throw an exception). + This was the case previously captured in the <code>skippedDueToDeserializationError</code> metrics. + </li> + <li>Fetched records having a negative timestamp.</li> + </ul> + <p> We have added support for methods in <code>ReadOnlyWindowStore</code> which allows for querying a single window's key-value pair. For users who have customized window store implementations on the above interface, they'd need to update their code to implement the newly added method as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Streams metrics for skipped records > ------------------------------------------- > > Key: KAFKA-6376 > URL: https://issues.apache.org/jira/browse/KAFKA-6376 > Project: Kafka > Issue Type: Improvement > Components: metrics, streams > Affects Versions: 1.0.0 > Reporter: Matthias J. Sax > Assignee: John Roesler > Priority: Major > Labels: kip > Fix For: 1.2.0 > > > Copy this from KIP-210 discussion thread: > {quote} > Note that currently we have two metrics for `skipped-records` on different > levels: > 1) on the highest level, the thread-level, we have a `skipped-records`, > that records all the skipped records due to deserialization errors. > 2) on the lower processor-node level, we have a > `skippedDueToDeserializationError`, that records the skipped records on > that specific source node due to deserialization errors. > So you can see that 1) does not cover any other scenarios and can just be > thought of as an aggregate of 2) across all the tasks' source nodes. > However, there are other places that can cause a record to be dropped, for > example: > 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be > dropped due to window elapsed. > 2) KIP-210: records could be dropped on the producer side. > 3) records could be dropped during user-customized processing on errors. > {quote} > [~guozhang] Not sure what you mean by "3) records could be dropped during > user-customized processing on errors." > Btw: we also drop record with {{null}} key and/or value for certain DSL > operations. This should be included as well. > KIP: : > https://cwiki.apache.org/confluence/display/KAFKA/KIP-274%3A+Kafka+Streams+Skipped+Records+Metrics -- This message was sent by Atlassian JIRA (v7.6.3#76005)