[ 
https://issues.apache.org/jira/browse/GOBBLIN-1087?focusedWorklogId=404860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404860
 ]

ASF GitHub Bot logged work on GOBBLIN-1087:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Mar/20 17:10
            Start Date: 17/Mar/20 17:10
    Worklog Time Spent: 10m 
      Work Description: sv2000 commented on pull request #2928: GOBBLIN-1087: 
Track and report histogram of observed lag from Gobblin…
URL: https://github.com/apache/incubator-gobblin/pull/2928#discussion_r393837517
 
 

 ##########
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java
 ##########
 @@ -398,5 +480,8 @@ public void reset() {
     for (int partitionIdx = 0; partitionIdx < this.partitions.size(); 
partitionIdx++) {
       resetStartFetchEpochTime(partitionIdx);
     }
+    if (this.observedLagHistogram != null) {
+      this.observedLagHistogram.reset();
 
 Review comment:
   Added benchmark to compare reset vs a new Histogram creation. While both 
reset and new are cheap, reset is 3x cheaper than new object creation. reset 
simply keeps the allocated count array as is and zeroes out the array. In 
general, it would be better to avoid new object creation to avoid GCs/memory 
fragmentation that can occur over time. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 404860)
    Time Spent: 1h 10m  (was: 1h)

> Track and report histogram of observed lag from Gobblin Kafka pipeline
> ----------------------------------------------------------------------
>
>                 Key: GOBBLIN-1087
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1087
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-kafka
>    Affects Versions: 0.15.0
>            Reporter: Sudarshan Vasudevan
>            Assignee: Shirshanka Das
>            Priority: Major
>             Fix For: 0.15.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In this PR, we instrument the KafkaExtractor to track the observed latency of 
> Kafka consumer records processed by the pipeline. Here, observed latency is 
> measured as the time difference between processing time of the record and the 
> original creation time. The latency distribution is tracked in an 
> HdrHistogram, which is serialized into a string when emitted as part of a 
> GobblinTrackingEvent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to