Hi Syed,

Please join the mailing list, so your responses make it here without needed
approval.

I am sure there is something odd going on here. Few things to check

- Hudi does use memory for caching inputs and computing heuristics. I have
seen slowness being caused by insufficient executor memory. Can you try a
larger heap size and configuring GC? (explained in
https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide)
- There is also a performance bug we fixed in 0.5.2. Can you try
setting hoodie.memory.merge.max.size=2147483648 (2GB of merge memory).
(initial load should be just doing an insert, so may be unrelated. still
something to keep in mind)

If you can open a GitHub issue with the spark UI screenshot and data size
etc, happy to take a look.

thanks
vinoth




On Wed, Mar 18, 2020 at 4:37 PM Syed Zaidi <[email protected]>
wrote:

> Hi Udit,
>
> Thanks for your recommendation. I was able to get the jars for 0.5.1. As a
> test we ran hudi against a small dataset (~2 million rows with 80 columns)
> in parquet file against 10 executors (m5.xlarge) . The initial load itself
> is taking 2+ hours. Do you have any suggestions on the settings I can
> update to speed up the process.
>
> Thanks
> Syed Zaidi
>
> ________________________________
> From: Mehrotra, Udit <[email protected]>
> Sent: Tuesday, March 17, 2020 8:08 PM
> To: [email protected] <[email protected]>; Syed Zaidi <
> [email protected]>
> Subject: Re: Question
>
> Hi Zaidi,
>
> You should be able to use Hudi 0.5.1 in the next EMR release that should
> be fairly soon, but we can't give you an ETA. Meanwhile, there is nothing
> really stopping you to build your hudi 0.5.1 jars and replacing the ones on
> EMR cluster. The jars are located on the master node at /usr/lib/hudi/.
> Just replace the 0.5.0 jars there and have the symlink jars point to your
> 0.5.1 jars.
>
> Thanks,
> Udit Mehrotra
> SDE | AWS EMR
>
> On 3/17/20, 5:34 PM, "Syed Zaidi" <[email protected]> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Hi,
>
>     AWS EMR emr-5.29.0 comes with spark Spark 2.4.4 and Hudi 0.5.0 (
> hudi-hadoop-mr-bundle-0.5.0-incubating.jar). In version 0.5.1 we have new
> options for reading the AWS DMS change logs using DeltaStreamer. Do you
> guys have any idea when will AWS support the newer version of hudi. What
> options I have to upgrade hudi to the latest version while creating the EMR
> to support AWS DMS payload out of the box.
>
>     Would appreciate your feedback in this regard.
>
>     Thanks
>     Syed Zaidi
>
>
>

Reply via email to