[ 
https://issues.apache.org/jira/browse/GOBBLIN-1493?focusedWorklogId=631762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631762
 ]

ASF GitHub Bot logged work on GOBBLIN-1493:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Jul/21 16:32
            Start Date: 30/Jul/21 16:32
    Worklog Time Spent: 10m 
      Work Description: sv2000 commented on a change in pull request #3336:
URL: https://github.com/apache/gobblin/pull/3336#discussion_r680064376



##########
File path: 
gobblin-api/src/main/java/org/apache/gobblin/service/ServiceConfigKeys.java
##########
@@ -58,6 +58,10 @@
   public static final String GOBBLIN_SERVICE_FLOW_CATALOG_LOCAL_COMMIT = 
GOBBLIN_SERVICE_PREFIX + "flowCatalog.localCommit";
   public static final boolean 
DEFAULT_GOBBLIN_SERVICE_FLOW_CATALOG_LOCAL_COMMIT = true;
 
+  // Job Level Keys
+  public static final String WORK_UNIT_BYTE_SIZE = GOBBLIN_SERVICE_PREFIX + 
".work.unit.byte.size";

Review comment:
       WORK_UNIT_BYTE_SIZE is confusing to read. Can we just call it 
work_unit_size? Also, is this config only intended for File-based sources? How 
about other sources e.g. Kafka Source, where this config could be used to mean 
number of records? 

##########
File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java
##########
@@ -463,6 +469,18 @@ public void apply(JobListener jobListener, JobContext 
jobContext)
           return;
         }
 
+        // calculation of total bytes to copy in a job used to track a job's 
copy progress
+        if (jobState.getPropAsBoolean(ConfigurationKeys.REPORT_JOB_PROGRESS, 
ConfigurationKeys.DEFAULT_REPORT_JOB_PROGRESS)) {
+          if (workUnitStream.isSafeToMaterialize()) {
+            long totalSizeInBytes = sumWorkUnitsSizes(workUnitStream);
+            this.jobContext.getJobState().setProp(TOTAL_BYTES_TO_COPY, 
totalSizeInBytes);

Review comment:
       TOTAL_BYTES_TO_COPY seems very Distcp-centric. We may want to think more 
broadly and just call this TOTAL_WORK_UNIT_SIZE. In case when individual 
sources do not provide workunit sizes, this should sum to total number of 
workunits. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 631762)
    Time Spent: 7h 20m  (was: 7h 10m)

> Data Copy Progress Reporting 
> -----------------------------
>
>                 Key: GOBBLIN-1493
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1493
>             Project: Apache Gobblin
>          Issue Type: New Feature
>          Components: gobblin-core, gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Progress reporting for a data copy will provide users with quantitative 
> feedback on the progress of a data copy job as a percentage as well as an 
> estimate of the time remaining for completion. This will update the existing 
> job status endpoint to include the progress percentage and estimate of time 
> left. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to