zhaofuw commented on PR #6854: URL: https://github.com/apache/hudi/pull/6854#issuecomment-1641686568
Hello, I've encountered a tricky problem. We're using Flink to sync binlog logs to Hudi. To ensure the stability of the main process, I use offline compaction. I would validate the data and in the beginning, the synchronized data was consistent, but today I discovered the data has become inconsistent. I found that some '*.log.*' files have never been merged into parquet. I tried running the 'HoodieRepairTool' tool, but the result was that these un-merged '*.log.*' files were all cleaned up. Why would there be a loss of metadata? I did not add lock-related configurations to my offline compaction, could this be the cause of the data inconsistency? I apologize for asking here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
