Hi Udit,

Thanks for your recommendation. I was able to get the jars for 0.5.1. As a test 
we ran hudi against a small dataset (~2 million rows with 80 columns) in 
parquet file against 10 executors (m5.xlarge) . The initial load itself is 
taking 2+ hours. Do you have any suggestions on the settings I can update to 
speed up the process.

Thanks
Syed Zaidi

________________________________
From: Mehrotra, Udit <[email protected]>
Sent: Tuesday, March 17, 2020 8:08 PM
To: [email protected] <[email protected]>; Syed Zaidi 
<[email protected]>
Subject: Re: Question

Hi Zaidi,

You should be able to use Hudi 0.5.1 in the next EMR release that should be 
fairly soon, but we can't give you an ETA. Meanwhile, there is nothing really 
stopping you to build your hudi 0.5.1 jars and replacing the ones on EMR 
cluster. The jars are located on the master node at /usr/lib/hudi/. Just 
replace the 0.5.0 jars there and have the symlink jars point to your 0.5.1 jars.

Thanks,
Udit Mehrotra
SDE | AWS EMR

On 3/17/20, 5:34 PM, "Syed Zaidi" <[email protected]> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    Hi,

    AWS EMR emr-5.29.0 comes with spark Spark 2.4.4 and Hudi 0.5.0 ( 
hudi-hadoop-mr-bundle-0.5.0-incubating.jar). In version 0.5.1 we have new 
options for reading the AWS DMS change logs using DeltaStreamer. Do you guys 
have any idea when will AWS support the newer version of hudi. What options I 
have to upgrade hudi to the latest version while creating the EMR to support 
AWS DMS payload out of the box.

    Would appreciate your feedback in this regard.

    Thanks
    Syed Zaidi


Reply via email to