matthiasdg commented on issue #6277: URL: https://github.com/apache/hudi/issues/6277#issuecomment-1205204572
Yes, have also noticed the "last commit time synced is..." and "getting commits since then" log messages. Based on the logs, seems it's the timestamp of the last commit synced that is used (not the sync time itself) Wondering how that works with commit retention now. I even experience issues with a single writer and only manual syncing. Test I did just now: I start with a fresh hive table from my existing data; I run HiveSyncTool and it says it adds 8753 partitions. After that I ingest the new data (this spans a lot of commits e.g. 60), run HiveSyncTool again, 1506 partitions are added (total=10259). Now I drop the table and rerun HiveSyncTool on all data at once: 11055 partitions are added. So not sure why there is a difference. We do this kind of stuff not that often (most of the time it's just adding data for existing devices, so only a day and/or month partition will be added), but it's a bit troubling that there are no warnings/errors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
