sv2000 commented on a change in pull request #3336:
URL: https://github.com/apache/gobblin/pull/3336#discussion_r680064376



##########
File path: 
gobblin-api/src/main/java/org/apache/gobblin/service/ServiceConfigKeys.java
##########
@@ -58,6 +58,10 @@
   public static final String GOBBLIN_SERVICE_FLOW_CATALOG_LOCAL_COMMIT = 
GOBBLIN_SERVICE_PREFIX + "flowCatalog.localCommit";
   public static final boolean 
DEFAULT_GOBBLIN_SERVICE_FLOW_CATALOG_LOCAL_COMMIT = true;
 
+  // Job Level Keys
+  public static final String WORK_UNIT_BYTE_SIZE = GOBBLIN_SERVICE_PREFIX + 
".work.unit.byte.size";

Review comment:
       WORK_UNIT_BYTE_SIZE is confusing to read. Can we just call it 
work_unit_size? Also, is this config only intended for File-based sources? How 
about other sources e.g. Kafka Source, where this config could be used to mean 
number of records? 

##########
File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java
##########
@@ -463,6 +469,18 @@ public void apply(JobListener jobListener, JobContext 
jobContext)
           return;
         }
 
+        // calculation of total bytes to copy in a job used to track a job's 
copy progress
+        if (jobState.getPropAsBoolean(ConfigurationKeys.REPORT_JOB_PROGRESS, 
ConfigurationKeys.DEFAULT_REPORT_JOB_PROGRESS)) {
+          if (workUnitStream.isSafeToMaterialize()) {
+            long totalSizeInBytes = sumWorkUnitsSizes(workUnitStream);
+            this.jobContext.getJobState().setProp(TOTAL_BYTES_TO_COPY, 
totalSizeInBytes);

Review comment:
       TOTAL_BYTES_TO_COPY seems very Distcp-centric. We may want to think more 
broadly and just call this TOTAL_WORK_UNIT_SIZE. In case when individual 
sources do not provide workunit sizes, this should sum to total number of 
workunits. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to