apoorvmittal10 commented on code in PR #18444:
URL: https://github.com/apache/kafka/pull/18444#discussion_r1909573910


##########
core/src/main/java/kafka/server/share/SharePartitionManager.java:
##########
@@ -693,6 +717,38 @@ private static void removeSharePartitionFromCache(
         }
     }
 
+    /**
+     * The handler to update the failed share fetch request metrics.
+     *
+     * @return A BiConsumer that updates the failed share fetch request 
metrics.
+     */
+    private BiConsumer<Collection<TopicIdPartition>, Boolean> 
failedShareFetchMetricsHandler() {
+        return (topicIdPartitions, allTopicPartitionsFailed) -> {
+            // Update failed share fetch request metric.
+            topicIdPartitions.forEach(topicIdPartition ->
+                
brokerTopicStats.topicStats(topicIdPartition.topicPartition().topic()).failedShareFetchRequestRate().mark());
+            if (allTopicPartitionsFailed) {

Review Comment:
   Yes, I thought about it while implementing and also saw failed fetch is 
populated as well when any fetch for topic fails. I was thinking in terms of 
metrics usage. Say for `failedShareFetchRequestRate`, If we always mark the all 
topic metric as failed when any one of the topic fetch failed then 2 metrics 
might not yeild major value. Then topic metric is more like a log which can 
help debug that which topic fetch has failed.
   
   Marking the all topic metric as failed when any topic fetch fails shall be 
desirable when complete request is failed on any topic fetch failure, which 
seems to be the case for regular 
[fetch](https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/core/src/main/scala/kafka/server/ReplicaManager.scala#L1890).
 But which is not true for share-fetch. I was thinking that for share-fetch if 
alltopic metric fails then it's critical (as complete fetch/acknowledge request 
failed), if topic level metric fails then operator should debug regarding what 
makes one of the topic failure (for single topic the metrics will yeild same 
result).
   
   Also I find bumping up all topic stats for each topic-partition in a request 
also not right - 
https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/core/src/main/scala/kafka/server/ReplicaManager.scala#L1453.
 As it will give incorrect request rate for overall metric. So avoided that as 
well in implementation.
   
   I know it's different than what we currently have but I am struggling to 
find value in existing implementation. I might be missing something hence open 
for suggestions so I can correct things.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to