[ 
https://issues.apache.org/jira/browse/GOBBLIN-2147?focusedWorklogId=936487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-936487
 ]

ASF GitHub Bot logged work on GOBBLIN-2147:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Sep/24 22:16
            Start Date: 25/Sep/24 22:16
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #4044:
URL: https://github.com/apache/gobblin/pull/4044#discussion_r1776035021


##########
gobblin-core/src/main/java/org/apache/gobblin/source/DatePartitionedNestedRetriever.java:
##########
@@ -129,8 +134,15 @@ public List<FileInfo> getFilesToProcess(long minWatermark, 
int maxFilesToReturn)
               new FileInfo(fileStatus.getPath().toString(), 
fileStatus.getLen(), date.getMillis(), partitionPath));
         }
       }
+
+      if (growthTracker.isAnotherMilestone(iteration++)) {
+        LOG.info("~{}~ collected files to process", filesToProcess.size());
+        LOG.info("Last Source Path processed : ~{}~", sourcePath);
+      }
     }
 
+    LOG.info("Finished processing files");

Review Comment:
   same here.  also, let's provide the count of how many completed - 
`filesToProcess.size()`





Issue Time Tracking
-------------------

    Worklog Id:     (was: 936487)
    Time Spent: 50m  (was: 40m)

> Add lookback time property in PartitionedFileSource
> ---------------------------------------------------
>
>                 Key: GOBBLIN-2147
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2147
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Vivek Rai
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> All FileBasedSource implementations should have config for lookback time.
>  
> Currently 
> FileBasedSources look for data since the time set by 
> `conversion.min.watermark` and time granularity is decided by the lowest time 
> denomination. that denomination in many cases, including this one, is 1 second
> (determined by 
> |gobblin.flow.input.dataset.descriptor.partition.pattern|yyyy-MM-dd_HH_mm_ss|
>  
> It is an extremely abusive way to find workunits.
> Let's enable these jobs to use lookback time configs like several other 
> dataset finders do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to