[ https://issues.apache.org/jira/browse/GOBBLIN-1087?focusedWorklogId=404860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404860 ]
ASF GitHub Bot logged work on GOBBLIN-1087: ------------------------------------------- Author: ASF GitHub Bot Created on: 17/Mar/20 17:10 Start Date: 17/Mar/20 17:10 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2928: GOBBLIN-1087: Track and report histogram of observed lag from Gobblin… URL: https://github.com/apache/incubator-gobblin/pull/2928#discussion_r393837517 ########## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java ########## @@ -398,5 +480,8 @@ public void reset() { for (int partitionIdx = 0; partitionIdx < this.partitions.size(); partitionIdx++) { resetStartFetchEpochTime(partitionIdx); } + if (this.observedLagHistogram != null) { + this.observedLagHistogram.reset(); Review comment: Added benchmark to compare reset vs a new Histogram creation. While both reset and new are cheap, reset is 3x cheaper than new object creation. reset simply keeps the allocated count array as is and zeroes out the array. In general, it would be better to avoid new object creation to avoid GCs/memory fragmentation that can occur over time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 404860) Time Spent: 1h 10m (was: 1h) > Track and report histogram of observed lag from Gobblin Kafka pipeline > ---------------------------------------------------------------------- > > Key: GOBBLIN-1087 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1087 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka > Affects Versions: 0.15.0 > Reporter: Sudarshan Vasudevan > Assignee: Shirshanka Das > Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In this PR, we instrument the KafkaExtractor to track the observed latency of > Kafka consumer records processed by the pipeline. Here, observed latency is > measured as the time difference between processing time of the record and the > original creation time. The latency distribution is tracked in an > HdrHistogram, which is serialized into a string when emitted as part of a > GobblinTrackingEvent. -- This message was sent by Atlassian Jira (v8.3.4#803005)