I actually understood more about your use-case also now, Raymond! thanks
for the response!

On Fri, Mar 6, 2020 at 7:02 PM Shiyan Xu <[email protected]>
wrote:

> I can answer this as my team faces exactly the same problems.
> We recently sync'ed up with AWS EMR team and got some directions.
>
> Hudi dataset <> Glue
> An interim approach is needed: configure S3 notification to detect new
> commit file after each compaction, upon the notification update an manifest
> file for Glue to update
> This is some workaround before Athena officially support Hudi dataset
>
> Athena support
> This is planned but no definite timeline given. High level approach is
> use Athena
> Hive external metadata store
> <
> https://docs.aws.amazon.com/athena/latest/ug/connect-to-data-source-hive.html
> >
> but
> Athena needs some changes to adapt to Hudi dataset
>
> The considerations from my team is: the interim approach should work nicely
> but require additional operational efforts.
> We have an alternative plan of using the new feature of Hudi snapshot
> exporter (https://issues.apache.org/jira/browse/HUDI-344) which is about
> to
> be merged.
> It helps exporting Hudi dataset to plain parquet files and work natively
> with Athena or Glue. We don't have very low latency requirements at the
> moment so periodic export works for us.
> The feature should be available in 6.0 but the class can be used as a
> standalone tool.
>
> On Fri, Mar 6, 2020 at 6:26 PM Sanchez, Jorge
> <[email protected]> wrote:
>
> > Hi Vinoth,
> >
> > Thanks for the reply, our design is to utilize Glue for ETL processing.
> We
> > would have to support both real time IOT data and batch ETL flows ( jdbc
> > source and static files like csv ).
> > The access layer would be through the presto cluster which would be
> > running on EC2 within AWS environment.
> >
> > We would like to utilize the historization of the data as it is one of
> the
> > requirements. My impression is that the Hudi is getting lot of attention
> > from AWS as it is now mainstreamed into EMR, what I don't see is the use
> > cases using the Glue environment - all the documentation mentions the
> EMR.
> >
> > My questions would be:
> > * how difficult would be to have the Hudi integrated to AWS Glue
> > * is the Glue metadata catalog fully supported for Hudi tables
> > * is the Glue crawler able to crawler and catalog the Hudi tables
> > * is there any plan for the Athena to support access to Hudi tables in
> the
> > future
> >
> > I understand that these question should be addressed to the AWS guys,
> > hoping that there are some of them on this channel.
> >
> > Regards,
> >
> > Jorge
> >
> > -----Original Message-----
> > From: Vinoth Chandar <[email protected]>
> > Sent: Friday, March 6, 2020 6:43 PM
> > To: [email protected]
> > Subject: Re: running Hudi in AWS Glue Spark
> >
> > EXTERNAL EMAIL – Use caution with any links or file attachments.
> >
> > https://aws.amazon.com/emr/features/hudi/ mentions that its integrated
> > with the glue catalog.
> >
> > It should be similar to other datasources you use on Glue IIUC.. I have
> > seen users talk about this on slack (IIRC)..
> > Are you running into specific issues we can help with? May be the AWS
> > folks here can chime in more?
> >
> > On Fri, Mar 6, 2020 at 3:47 AM Sanchez, Jorge <[email protected]
> .invalid>
> > wrote:
> >
> > > Hello,
> > >
> > > Did anybody tried to run Hudi within AWS Glue job, I searched the JIRA
> > > issues but did not find anybody mentioning that.
> > >
> > >
> > > Thanks,
> > >
> > > Jorge
> > > Notice:  This e-mail message, together with any attachments, contains
> > > information of Merck & Co., Inc. (2000 Galloping Hill Road,
> > > Kenilworth, New Jersey, USA 07033), and/or its affiliates Direct
> > > contact information for affiliates is available at
> > > http://www.merck.com/contact/contacts.html) that may be confidential,
> > > proprietary copyrighted and/or legally privileged. It is intended
> > > solely for the use of the individual or entity named on this message.
> > > If you are not the intended recipient, and have received this message
> > > in error, please notify us immediately by reply e-mail and then delete
> > > it from your system.
> > >
> > Notice:  This e-mail message, together with any attachments, contains
> > information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
> > New Jersey, USA 07033), and/or its affiliates Direct contact information
> > for affiliates is available at
> > http://www.merck.com/contact/contacts.html) that may be confidential,
> > proprietary copyrighted and/or legally privileged. It is intended solely
> > for the use of the individual or entity named on this message. If you are
> > not the intended recipient, and have received this message in error,
> > please notify us immediately by reply e-mail and then delete it from
> > your system.
> >
>

Reply via email to