bvaradar commented on a change in pull request #1964:
URL: https://github.com/apache/hudi/pull/1964#discussion_r475895369
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -296,6 +300,42 @@ public boolean isBeforeTimelineStarts(String instant) {
return details.apply(instant);
}
+ /**
Review comment:
Timeline APIs are only about instants in general. I think adding
partitions here is breaking that abstraction. Can you move this to some helper
class
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java
##########
@@ -232,6 +233,12 @@
*/
Option<byte[]> getInstantDetails(HoodieInstant instant);
+ /**
+ * Returns partitions that have been modified in the timeline. This includes
internal operations such as clean.
+ * Note that this only returns data for completed instants.
+ */
+ List<String> getPartitionsMutated();
Review comment:
+1 on abstraction point. I think having a separate helper class would be
better.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -296,6 +300,42 @@ public boolean isBeforeTimelineStarts(String instant) {
return details.apply(instant);
}
+ /**
+ * Returns partitions that have been modified in the timeline. This includes
internal operations such as clean.
+ * Note that this only returns data for completed instants.
+ */
+ public List<String> getPartitionsMutated() {
+ return filterCompletedInstants().getInstants().flatMap(s -> {
+ switch (s.getAction()) {
+ case HoodieTimeline.COMMIT_ACTION:
+ case HoodieTimeline.DELTA_COMMIT_ACTION:
+ try {
+ HoodieCommitMetadata commitMetadata =
HoodieCommitMetadata.fromBytes(getInstantDetails(s).get(),
HoodieCommitMetadata.class);
+ return commitMetadata.getPartitionToWriteStats().keySet().stream();
+ } catch (IOException e) {
+ throw new HoodieIOException("Failed to get partitions written
between " + firstInstant() + " " + lastInstant(), e);
+ }
+ case HoodieTimeline.CLEAN_ACTION:
Review comment:
We dont need to look at clean and (event compaction) for figuring out
changed partitions for the case of hive syncing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]