neerajpadarthi commented on issue #11540: URL: https://github.com/apache/hudi/issues/11540#issuecomment-2209577056
Thanks for sharing this link. In our case, the upserts are relatively small, affecting only a few files (10–100's), so from the link's benchmarking details we should be relatively good when using the direct markers. However, for the first bulk insert load, we will enable the timeline server to utilize the timeline batches and prevent any S3 throttling errors. @ad1happy2go - Is the timeout issue fixed in 0.14V? We will evaluate once the redshift supports 0.14 v. Please let us know. Also, I am observing below difference with the indexing stage before and after the marker change during upserts. Can you please help me understand the runtime difference? I am guessing that the "Load latest base files from all partitions" job finished quickly with direct market as it didn't have the overhead of setting the timeline server, but I see significant time taking with this job, "Obtain key ranges for file slices (range pruning=on)" after disabling the timeline marker server. Can you please help me understand the difference during the range pruning with direct. vs. timeline marker server? timelineserver marker  direct market  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
