Re: [I] Hudi job hangs forever [hudi]

via GitHub Thu, 04 Jul 2024 14:27:16 -0700


neerajpadarthi commented on issue #11540:
URL: https://github.com/apache/hudi/issues/11540#issuecomment-2209577056

Thanks for sharing this link. In our case, the upserts are relatively small,
affecting only a few files (10–100's), so from the link's benchmarking details
we should be relatively good when using the direct markers. However, for the
first bulk insert load, we will enable the timeline server to utilize the
timeline batches and prevent any S3 throttling errors.

@ad1happy2go - Is the timeout issue fixed in 0.14V? We will evaluate once
the redshift supports 0.14 v. Please let us know.
Also, I am observing below difference with the indexing stage before and
after the marker change during upserts. Can you please help me understand the
runtime difference? I am guessing that the "Load latest base files from all
partitions" job finished quickly with direct market as it didn't have the
overhead of setting the timeline server, but I see significant time taking with
this job, "Obtain key ranges for file slices (range pruning=on)" after
disabling the timeline marker server. Can you please help me understand the
difference during the range pruning with direct. vs. timeline marker server?

timelineserver marker

![image](https://github.com/apache/hudi/assets/42651065/9fd08e3e-3925-44d5-9390-0313f7fcd434)

direct market

![image](https://github.com/apache/hudi/assets/42651065/9c2836dc-a4e2-426f-8b92-463e1ec2d4db)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Hudi job hangs forever [hudi]

Reply via email to