[ 
https://issues.apache.org/jira/browse/GOBBLIN-1708?focusedWorklogId=809680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-809680
 ]

ASF GitHub Bot logged work on GOBBLIN-1708:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Sep/22 22:25
            Start Date: 16/Sep/22 22:25
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on code in PR #3563:
URL: https://github.com/apache/gobblin/pull/3563#discussion_r973484788


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -134,9 +134,40 @@ protected List<FileStatus> getFilesAtPath(FileSystem fs, 
Path path, PathFilter f
     return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1, 
startDate, endDate, formatter);
   }
 
+  public Boolean checkPathDateTimeValidity(LocalDateTime startDate, 
LocalDateTime endDate, String traversedDatePath) {
+    int[] startDateSplit = new int[] { startDate.getYear(), 
startDate.getMonthOfYear(), startDate.getDayOfMonth(),
+        startDate.getHourOfDay(), startDate.getMinuteOfHour(), 
startDate.getSecondOfMinute(), startDate.getMillisOfSecond() };
+    int[] endDateSplit = new int[] { endDate.getYear(), 
endDate.getMonthOfYear(), endDate.getDayOfMonth(),
+        endDate.getHourOfDay(), endDate.getMinuteOfHour(), 
endDate.getSecondOfMinute(), endDate.getMillisOfSecond() };
+
+    String[] traversedDatePathSplit = traversedDatePath.split("/");
+
+    // Only check the number of parameters that the traversedDatePath has 
traversed through so far
+    for (int index = 0; index < traversedDatePathSplit.length; index++) {
+      // Only attempt to parse the number if the entire string are digits
+      boolean onlyNumbers = traversedDatePathSplit[index].matches("^[0-9]+$");
+      if (onlyNumbers) {
+        if (Integer.parseInt(traversedDatePathSplit[index]) < 
startDateSplit[index] ||
+            Integer.parseInt(traversedDatePathSplit[index]) > 
endDateSplit[index]) {
+          return false;
+        }
+      }
+      else {
+        return false;
+      }
+    }
+    return true;

Review Comment:
   I believe this would not work when considering ranges that span beyond 
multiple years/months/days.
   
   Consider traversedDatePathSplit == [2022, 09, 01 ....]
   startDate is 2022/08/20, endDate is 2022/09/10
   Then:
   it would return false since it thinks the date is previous to the start 
Date, 01 < 20.
   
   I would follow Arjun's recommendation of keeping strings as dates, and then 
rounding them to the lowest granularity, and then comparing them.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 809680)
    Time Spent: 1h 40m  (was: 1.5h)

> Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders 
> that match range
> --------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1708
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1708
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Andy Jiang
>            Priority: Major
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to