Hi Vinoth,

Thanks for the reply, our design is to utilize Glue for ETL processing. We 
would have to support both real time IOT data and batch ETL flows ( jdbc source 
and static files like csv ). 
The access layer would be through the presto cluster which would be running on 
EC2 within AWS environment. 

We would like to utilize the historization of the data as it is one of the 
requirements. My impression is that the Hudi is getting lot of attention from 
AWS as it is now mainstreamed into EMR, what I don't see is the use cases using 
the Glue environment - all the documentation mentions the EMR. 

My questions would be:
* how difficult would be to have the Hudi integrated to AWS Glue 
* is the Glue metadata catalog fully supported for Hudi tables
* is the Glue crawler able to crawler and catalog the Hudi tables 
* is there any plan for the Athena to support access to Hudi tables in the 
future

I understand that these question should be addressed to the AWS guys, hoping 
that there are some of them on this channel. 

Regards,

Jorge

-----Original Message-----
From: Vinoth Chandar <[email protected]> 
Sent: Friday, March 6, 2020 6:43 PM
To: [email protected]
Subject: Re: running Hudi in AWS Glue Spark

EXTERNAL EMAIL – Use caution with any links or file attachments.

https://aws.amazon.com/emr/features/hudi/ mentions that its integrated with the 
glue catalog.

It should be similar to other datasources you use on Glue IIUC.. I have seen 
users talk about this on slack (IIRC)..
Are you running into specific issues we can help with? May be the AWS folks 
here can chime in more?

On Fri, Mar 6, 2020 at 3:47 AM Sanchez, Jorge 
<[email protected]> wrote:

> Hello,
>
> Did anybody tried to run Hudi within AWS Glue job, I searched the JIRA 
> issues but did not find anybody mentioning that.
>
>
> Thanks,
>
> Jorge
> Notice:  This e-mail message, together with any attachments, contains 
> information of Merck & Co., Inc. (2000 Galloping Hill Road, 
> Kenilworth, New Jersey, USA 07033), and/or its affiliates Direct 
> contact information for affiliates is available at
> http://www.merck.com/contact/contacts.html) that may be confidential, 
> proprietary copyrighted and/or legally privileged. It is intended 
> solely for the use of the individual or entity named on this message. 
> If you are not the intended recipient, and have received this message 
> in error, please notify us immediately by reply e-mail and then delete 
> it from your system.
>
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
New Jersey, USA 07033), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.

Reply via email to