vinothchandar commented on a change in pull request #1105: [HUDI-405] Fix sync
no hive partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#discussion_r359135764
##########
File path: hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -632,24 +633,33 @@ public void close() {
if (!lastCommitTimeSynced.isPresent()) {
LOG.info("Last commit time synced is not known, listing all partitions
in " + syncConfig.basePath + ",FS :" + fs);
try {
- return FSUtils.getAllPartitionPaths(fs, syncConfig.basePath,
syncConfig.assumeDatePartitioning);
+
+ List<String> fsPartitions = FSUtils.getAllPartitionPaths(fs,
syncConfig.basePath, syncConfig.assumeDatePartitioning);
+ List<String> tlPartitions =
findPartitionsAfter(HOODIE_FIRST_COMMIT_TIME);
Review comment:
This code is attempting to union the results from both
`FSUtils.getAllPartitionPaths` and the timeline search.. But we only keep the
last N commit files in the active timeline (rest gets archived).. So, there
could be some partitions written in these archived commits that the
`findPartitionsAfter` method will not inspect.. What I am saying is, to
properly handle all cases, we need FSUtils.getAllPartitionPaths to work anyway.
So, this is not guaranteed to fix the issue anyway. if the
`assumeDatePartitioning` is set to true (false by default), we can just assume
the user knows what he/she is doing and ignore this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services