lamber-ken edited a comment on issue #1105: [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-566390544
 
 
   > Don't follow why the partitions are not visible after the commit? Can we 
first layout the root cause for that?
   
   ### Why the first time can't get the data
   
   At the first time, the `lastCommitTimeSynced` of the target table is not 
present, HoodieHiveClient gets all partition paths by 
`FSUtils.getAllPartitionPaths`. If `HIVE_ASSUME_DATE_PARTITION_OPT_KEY` is set 
`true`, the fsutil can only match `basePath + /*/*/*`, but the partition is 
`basePath + /yyyy-MM-dd` actually. 
   
   
![image](https://user-images.githubusercontent.com/20113411/70967797-5cc72f00-20d2-11ea-8004-6d910879d1ac.png)
   
   ### Two ways to solve this problem
   1, Set `HIVE_ASSUME_DATE_PARTITION_OPT_KEY` to `false`. After that, 
HoodieHiveClient will get all folder partitions, for detail, you can visit 
`FSUtils#getAllPartitionPaths`.
   
   2, If user custom the partition extractor, HiveSyncTool sync no partition at 
the first commit, we can get the partiton info from `HoodieTimeline`, just like 
the code I modified.
   
   IMO, the second solution can guarantee that whether 
`HIVE_ASSUME_DATE_PARTITION_OPT_KEY` is true or not, we can sync acpartition at 
first time.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to