WTa-hash commented on issue #4241:
URL: https://github.com/apache/hudi/issues/4241#issuecomment-990307460


   I ran a test where I deleted some .parquet files from a Hudi table (stored 
on S3) to simulate S3 replication lagging behind in the copy process and 
therefore causing the target S3 bucket to have missing data files. Then used 
AWS Athena to query the Hudi table with missing data files.
   
   Result:
   1. Query ran successfully with missing data files (.parquet) - AWS Athena 
did not error out.
   2. The returned results were incorrect (count and data returned did not 
match with original Datalake). This makes sense because .parquet data files are 
missing.
   
   Next, I wonder what happens if S3 replication is slow and it's not able to 
replicate the files in .hoodie folder fast enough - what happens when I run SQL 
queries on a Hudi table with missing .hoodie files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to