[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

ASF GitHub Bot (Jira) Tue, 05 Nov 2019 21:54:57 -0800


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339165
 ]


ASF GitHub Bot logged work on GOBBLIN-945:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Nov/19 05:53
            Start Date: 06/Nov/19 05:53
    Worklog Time Spent: 10m 
      Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342926449
 
 

 ##########
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##########
 @@ -392,112 +303,16 @@ public long getExpectedRecordCount() {
 
   @Override
   public void close() throws IOException {
-    if (currentPartitionIdx != INITIAL_PARTITION_IDX) {
-      updateStatisticsForCurrentPartition();
+    if (!allPartitionsFinished()) {
 
 Review comment:
   Yes, the current implementation is confusing when end of partitions is 
reached. It calls updateStatisticsForCurrentPartition(), but essentially does 
nothing inside the method, since recordCount == 0. The change IMO is more 
readable.  
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 339165)
    Time Spent: 1h 40m  (was: 1.5h)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-945
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-945
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-kafka
>    Affects Versions: 0.15.0
>            Reporter: Sudarshan Vasudevan
>            Assignee: Shirshanka Das
>            Priority: Major
>             Fix For: 0.15.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

Reply via email to