[
https://issues.apache.org/jira/browse/GOBBLIN-1708?focusedWorklogId=810219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-810219
]
ASF GitHub Bot logged work on GOBBLIN-1708:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 20/Sep/22 00:56
Start Date: 20/Sep/22 00:56
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3563:
URL: https://github.com/apache/gobblin/pull/3563#discussion_r974794192
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -134,9 +134,51 @@ protected List<FileStatus> getFilesAtPath(FileSystem fs,
Path path, PathFilter f
return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1,
startDate, endDate, formatter);
}
+ /**
+ * Checks if the datePath provided is in the range of the start and end
dates.
+ * Rounds startDate and endDate to the same granularity as datePath prior to
comparing.
+ * Returns true if the datePath provided is in the range of start and end
dates, inclusive.
+ * @param startDate
+ * @param endDate
+ * @param datePath
+ * @param datePathFormat (This is the user set desired format)
+ * @param level
+ * @return true/false
+ */
+ public Boolean checkPathDateTimeValidity(LocalDateTime startDate,
LocalDateTime endDate, String datePath, String datePathFormat, int level) {
+ String [] array = datePathFormat.split("/");
+ StringBuilder datePathPattern = new StringBuilder();
+
+ for (int index = 1; index < level; index++) {
+ if (index > 1) {
+ datePathPattern.append("/");
+ }
+ datePathPattern.append(array[index - 1]);
+ }
+
+ try {
+ DateTimeFormatter formatGranularity =
DateTimeFormat.forPattern(datePathPattern.toString());
+ LocalDateTime traversedDatePathRound =
formatGranularity.parseLocalDateTime(datePath);
+ LocalDateTime startDateRound =
formatGranularity.parseLocalDateTime(startDate.toString(datePathPattern.toString()));
+ LocalDateTime endDateRound =
formatGranularity.parseLocalDateTime(endDate.toString(datePathPattern.toString()));
+
+ boolean afterOrOnStartDate =
traversedDatePathRound.isAfter(startDateRound) ||
traversedDatePathRound.isEqual(startDateRound);
+ boolean beforeOrOnEndDate =
traversedDatePathRound.isBefore(endDateRound) ||
traversedDatePathRound.isEqual(endDateRound);
+ return afterOrOnStartDate && beforeOrOnEndDate;
+ } catch (IllegalArgumentException e) {
+ log.error("Cannot parse path " + datePath);
Review Comment:
Add some expectation around this log too,
`String.format("Cannot parse path at %s, expected in format of %s",
datePath, datePathPattern)`
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -134,9 +134,51 @@ protected List<FileStatus> getFilesAtPath(FileSystem fs,
Path path, PathFilter f
return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1,
startDate, endDate, formatter);
}
+ /**
+ * Checks if the datePath provided is in the range of the start and end
dates.
+ * Rounds startDate and endDate to the same granularity as datePath prior to
comparing.
+ * Returns true if the datePath provided is in the range of start and end
dates, inclusive.
+ * @param startDate
+ * @param endDate
+ * @param datePath
+ * @param datePathFormat (This is the user set desired format)
+ * @param level
+ * @return true/false
+ */
+ public Boolean checkPathDateTimeValidity(LocalDateTime startDate,
LocalDateTime endDate, String datePath, String datePathFormat, int level) {
+ String [] array = datePathFormat.split("/");
+ StringBuilder datePathPattern = new StringBuilder();
+
+ for (int index = 1; index < level; index++) {
+ if (index > 1) {
+ datePathPattern.append("/");
+ }
+ datePathPattern.append(array[index - 1]);
+ }
Review Comment:
I think an easier way of doing this is to define a list, and use
`String.join("/", Arrays.asList(datePathFormatArray).subList(0, level));`
which essentially gives you the reconstructed datePathFormat
Issue Time Tracking
-------------------
Worklog Id: (was: 810219)
Time Spent: 3h 10m (was: 3h)
> Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders
> that match range
> --------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-1708
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1708
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Andy Jiang
> Priority: Major
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)