Hi Vinoth, Thanks for the reply, our design is to utilize Glue for ETL processing. We would have to support both real time IOT data and batch ETL flows ( jdbc source and static files like csv ). The access layer would be through the presto cluster which would be running on EC2 within AWS environment.
We would like to utilize the historization of the data as it is one of the requirements. My impression is that the Hudi is getting lot of attention from AWS as it is now mainstreamed into EMR, what I don't see is the use cases using the Glue environment - all the documentation mentions the EMR. My questions would be: * how difficult would be to have the Hudi integrated to AWS Glue * is the Glue metadata catalog fully supported for Hudi tables * is the Glue crawler able to crawler and catalog the Hudi tables * is there any plan for the Athena to support access to Hudi tables in the future I understand that these question should be addressed to the AWS guys, hoping that there are some of them on this channel. Regards, Jorge -----Original Message----- From: Vinoth Chandar <[email protected]> Sent: Friday, March 6, 2020 6:43 PM To: [email protected] Subject: Re: running Hudi in AWS Glue Spark EXTERNAL EMAIL – Use caution with any links or file attachments. https://aws.amazon.com/emr/features/hudi/ mentions that its integrated with the glue catalog. It should be similar to other datasources you use on Glue IIUC.. I have seen users talk about this on slack (IIRC).. Are you running into specific issues we can help with? May be the AWS folks here can chime in more? On Fri, Mar 6, 2020 at 3:47 AM Sanchez, Jorge <[email protected]> wrote: > Hello, > > Did anybody tried to run Hudi within AWS Glue job, I searched the JIRA > issues but did not find anybody mentioning that. > > > Thanks, > > Jorge > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (2000 Galloping Hill Road, > Kenilworth, New Jersey, USA 07033), and/or its affiliates Direct > contact information for affiliates is available at > http://www.merck.com/contact/contacts.html) that may be confidential, > proprietary copyrighted and/or legally privileged. It is intended > solely for the use of the individual or entity named on this message. > If you are not the intended recipient, and have received this message > in error, please notify us immediately by reply e-mail and then delete > it from your system. > Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth, New Jersey, USA 07033), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
