noobarcitect opened a new issue #2461:
URL: https://github.com/apache/hudi/issues/2461


   We are in the POC stage of implementing apache hudi in our existing AWS 
datalake and pipeline. There is one issue that we are stuck at. The issue is as 
follows : 
   1. We inserted a record into hudi table on COW mode. And then we made an 
upsert updating that record initially inserted.
   2. Now this Hudi table gets crawled through aws glue crawler.
   3. If we try to read the table from Athena, we get all 3 records. But what 
we want is only the latest delta record in athena query.
   4. One reason we came across is that glue reads the hudi files as parquet 
files and read the inputformat as MapReduceParquetFormat rather than 
hoddieParquet format.
   
   Q: Will there be a support in glue crawlers to identify the hoodieparquet 
format as input format ? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to