WTa-hash commented on issue #4241: URL: https://github.com/apache/hudi/issues/4241#issuecomment-990307460
I ran a test where I deleted some .parquet files from a Hudi table (stored on S3) to simulate S3 replication lagging behind in the copy process and therefore causing the target S3 bucket to have missing data files. Then used AWS Athena to query the Hudi table with missing data files. Result: 1. Query ran successfully with missing data files (.parquet) - AWS Athena did not error out. 2. The returned results were incorrect (count and data returned did not match with original Datalake). This makes sense because .parquet data files are missing. Next, I wonder what happens if S3 replication is slow and it's not able to replicate the files in .hoodie folder fast enough - what happens when I run SQL queries on a Hudi table with missing .hoodie files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
