neerajpadarthi commented on issue #11540:
URL: https://github.com/apache/hudi/issues/11540#issuecomment-2209577056

   Thanks for sharing this link. In our case, the upserts are relatively small, 
affecting only a few files (10–100's), so from the link's benchmarking details 
we should be relatively good when using the direct markers. However, for the 
first bulk insert load, we will enable the timeline server to utilize the 
timeline batches and prevent any S3 throttling errors. 
   
   @ad1happy2go -  Is the timeout issue fixed in 0.14V? We will evaluate once 
the redshift supports 0.14 v. Please let us know. 
   Also, I am observing below difference with the indexing stage before and 
after the marker change during upserts. Can you please help me understand the 
runtime difference? I am guessing that the "Load latest base files from all 
partitions" job finished quickly with direct market as it didn't have the 
overhead of setting the timeline server, but I see significant time taking with 
this job, "Obtain key ranges for file slices (range pruning=on)" after 
disabling the timeline marker server. Can you please help me understand the 
difference during the range pruning with direct. vs. timeline marker server?
   
   timelineserver marker
   
![image](https://github.com/apache/hudi/assets/42651065/9fd08e3e-3925-44d5-9390-0313f7fcd434)
   
   
   direct market
   
![image](https://github.com/apache/hudi/assets/42651065/9c2836dc-a4e2-426f-8b92-463e1ec2d4db)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to