codope commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1181526980


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##########
@@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
       lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
     }
     LOG.info("Last commit time synced was found to be " + 
lastCommitTimeSynced.orElse("null"));
-    List<String> writtenPartitionsSince = 
syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
-    LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());
 
-    // Sync the partitions if needed
-    // find dropped partitions, if any, in the latest commit
-    Set<String> droppedPartitions = 
syncClient.getDroppedPartitionsSince(lastCommitTimeSynced);
-    boolean partitionsChanged = syncPartitions(tableName, 
writtenPartitionsSince, droppedPartitions);
+    boolean partitionsChanged;
+    if (!lastCommitTimeSynced.isPresent()
+        || 
syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get()))
 {
+      // If the last commit time synced is before the start of the active 
timeline,
+      // the Hive sync falls back to list all partitions on storage, instead of
+      // reading active and archived timelines for written partitions.
+      LOG.info("Sync all partitions given the last commit time synced is empty 
or "
+          + "before the start of the active timeline. Listing all partitions 
in "
+          + config.getString(META_SYNC_BASE_PATH)
+          + ", file system: " + config.getHadoopFileSystem());
+      partitionsChanged = syncAllPartitions(tableName);
+    } else {
+      List<String> writtenPartitionsSince = 
syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
+      LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());

Review Comment:
   I am going to change only for the logs introduced in this PR. I think we 
should take the cleanup in a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to