[
https://issues.apache.org/jira/browse/GOBBLIN-1708?focusedWorklogId=809623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-809623
]
ASF GitHub Bot logged work on GOBBLIN-1708:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 16/Sep/22 18:44
Start Date: 16/Sep/22 18:44
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3563:
URL: https://github.com/apache/gobblin/pull/3563#discussion_r973300712
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -134,9 +135,37 @@ protected List<FileStatus> getFilesAtPath(FileSystem fs,
Path path, PathFilter f
return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1,
startDate, endDate, formatter);
}
+ public Boolean checkPathDateTimeValidity(LocalDateTime startDate,
LocalDateTime endDate, String traversedDatePath) {
+ int[] startDateSplit = new int[] { startDate.getYear(),
startDate.getMonthOfYear(), startDate.getDayOfMonth(),
+ startDate.getHourOfDay(), startDate.getMinuteOfHour(),
startDate.getSecondOfMinute(), startDate.getMillisOfSecond() };
+ int[] endDateSplit = new int[] { endDate.getYear(),
endDate.getMonthOfYear(), endDate.getDayOfMonth(),
+ endDate.getHourOfDay(), endDate.getMinuteOfHour(),
endDate.getSecondOfMinute(), endDate.getMillisOfSecond() };
+
+ String[] traversedDatePathSplit = traversedDatePath.split("/");
+
+ // Only check the number of parameters that the traversedDatePath has
traversed through so far
+ for (int index = 0; index < traversedDatePathSplit.length; index++) {
+ try {
+ if (Integer.parseInt(traversedDatePathSplit[index]) <
startDateSplit[index] ||
+ Integer.parseInt(traversedDatePathSplit[index]) >
endDateSplit[index]) {
+ return false;
+ }
+ } catch (Exception e) {
Review Comment:
What exception would be thrown here? We should avoid a wide catch and silent
return
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -134,9 +135,37 @@ protected List<FileStatus> getFilesAtPath(FileSystem fs,
Path path, PathFilter f
return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1,
startDate, endDate, formatter);
}
+ public Boolean checkPathDateTimeValidity(LocalDateTime startDate,
LocalDateTime endDate, String traversedDatePath) {
+ int[] startDateSplit = new int[] { startDate.getYear(),
startDate.getMonthOfYear(), startDate.getDayOfMonth(),
+ startDate.getHourOfDay(), startDate.getMinuteOfHour(),
startDate.getSecondOfMinute(), startDate.getMillisOfSecond() };
+ int[] endDateSplit = new int[] { endDate.getYear(),
endDate.getMonthOfYear(), endDate.getDayOfMonth(),
+ endDate.getHourOfDay(), endDate.getMinuteOfHour(),
endDate.getSecondOfMinute(), endDate.getMillisOfSecond() };
+
+ String[] traversedDatePathSplit = traversedDatePath.split("/");
+
+ // Only check the number of parameters that the traversedDatePath has
traversed through so far
+ for (int index = 0; index < traversedDatePathSplit.length; index++) {
+ try {
+ if (Integer.parseInt(traversedDatePathSplit[index]) <
startDateSplit[index] ||
+ Integer.parseInt(traversedDatePathSplit[index]) >
endDateSplit[index]) {
+ return false;
+ }
+ } catch (Exception e) {
+ return false;
+ }
+ }
+ return true;
+ }
+
private List<FileStatus> recursivelyGetFilesAtDatePath(FileSystem fs, Path
path, String traversedDatePath, PathFilter fileFilter,
int level, LocalDateTime startDate, LocalDateTime endDate,
DateTimeFormatter formatter) throws IOException {
List<FileStatus> fileStatuses = Lists.newArrayList();
+ if (!Objects.equals(traversedDatePath, "")) {
Review Comment:
you can do traversedDatePath.equals(""), unless you think it can be null?
Issue Time Tracking
-------------------
Worklog Id: (was: 809623)
Time Spent: 20m (was: 10m)
> Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders
> that match range
> --------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-1708
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1708
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Andy Jiang
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)