[
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339165
]
ASF GitHub Bot logged work on GOBBLIN-945:
------------------------------------------
Author: ASF GitHub Bot
Created on: 06/Nov/19 05:53
Start Date: 06/Nov/19 05:53
Worklog Time Spent: 10m
Work Description: sv2000 commented on pull request #2795: GOBBLIN-945:
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342926449
##########
File path:
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
##########
@@ -392,112 +303,16 @@ public long getExpectedRecordCount() {
@Override
public void close() throws IOException {
- if (currentPartitionIdx != INITIAL_PARTITION_IDX) {
- updateStatisticsForCurrentPartition();
+ if (!allPartitionsFinished()) {
Review comment:
Yes, the current implementation is confusing when end of partitions is
reached. It calls updateStatisticsForCurrentPartition(), but essentially does
nothing inside the method, since recordCount == 0. The change IMO is more
readable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 339165)
Time Spent: 1h 40m (was: 1.5h)
> Refactor Kafka extractor statistics tracking to allow code reuse across both
> batch and streaming execution modes
> ----------------------------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-kafka
> Affects Versions: 0.15.0
> Reporter: Sudarshan Vasudevan
> Assignee: Shirshanka Das
> Priority: Major
> Fix For: 0.15.0
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated
> with the batch implementation of KafkaExtractor preventing it from being used
> in streaming Kafka extractor implementations. In addition to code reuse, the
> refactoring allows for writing unit tests for statistics tracker.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)