matthiasdg commented on issue #6277:
URL: https://github.com/apache/hudi/issues/6277#issuecomment-1205204572

   Yes, have also noticed the "last commit time synced is..." and "getting 
commits since then" log messages. Based on the logs, seems it's the timestamp 
of the last commit synced that is used (not the sync time itself)
   Wondering how that works with commit retention now.
   
   I even experience issues with a single writer and only manual syncing.
   Test I did just now: I start with a fresh hive table from my existing data; 
I run HiveSyncTool and it says it adds 8753 partitions. After that I ingest the 
new data (this spans a lot of commits e.g. 60), run HiveSyncTool again, 1506 
partitions are added (total=10259). 
   Now I drop the table and rerun HiveSyncTool on all data at once: 11055 
partitions are added. So not sure why there is a difference. We do this kind of 
stuff not that often (most of the time it's just adding data for existing 
devices, so only a day and/or month partition will be added), but it's a bit 
troubling that there are no warnings/errors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to