Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

via GitHub Tue, 16 Apr 2024 23:11:21 -0700


juice411 commented on issue #11016:
URL: https://github.com/apache/hudi/issues/11016#issuecomment-2060452646


   
![image](https://github.com/apache/hudi/assets/10968514/9c567a2c-9237-453c-8706-af380cf28a6b)
   During our testing, we've encountered an unusual issue with the Hudi stream 
read table. When the downstream processing system fetches data from the 
upstream Hudi table (designed as a stream read table) and attempts to process 
it, it reports that it cannot find log files. Obviously, this is expected since 
the data has been merged into Parquet files. However, the question remains: why 
is Hudi still searching for these non-existent files?
   
   This issue is causing inconsistencies in the downstream processing results, 
leading us to believe that the downstream system might not be able to fully 
capture all the data from the upstream table. We're eager to understand the 
root cause of this behavior and if there are any recommended workarounds or 
configurations that we should be aware of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

Reply via email to