bvaradar commented on issue #1828: URL: https://github.com/apache/hudi/issues/1828#issuecomment-658764479
@kirkuz : AWS Athena support for Hudi is just out : https://aws.amazon.com/about-aws/whats-new/2020/07/amazon-athena-adds-support-querying-apache-hudi-datasets-amazon-s3-based-data-lake/ With this your query should not see any duplicate records. The duplicate records could only happen if the table is not defined properly with the correct Input-format. The reason behind keeping at-least 1 previous version is to prevent queries from failing when concurrent write is happening. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
