rohangarg commented on PR #16481: URL: https://github.com/apache/druid/pull/16481#issuecomment-2133077298
@Akshat-Jain : I had some doubts regarding the changes : 1. Is there an example of how much time does the S3 writes take in real jobs? That can include the fraction of the overall time being consumed as well. Also, maybe that varies for small jobs vs large jobs and the concurrency in the system. 2. I see that parallelism is introduced via a thread-pool in the system. Does that pool ensure some fairness for each writer? Earlier each writer used to get atleast one thread instantly for writing - what are your thoughts on that requirement? 3. Also, adding things arbitrarily in thread-pool might mean that the some parts for a writer may have to wait to be uploaded. Is there any mechanism to know a summary of the wait times of the parts? I am asking this since I'm not sure how would a person evaluate the performance of a write in a concurrent system. 4. I see some thread safe things being done with semaphores in the output stream - is the output stream expected to be thread safe? Or is that done for coordination with the executor among multiple writers. If it for coordination, then should that code reside inside the thread-pool executor somehow? That is also attached to the backpressure mechanism being built per-writer. 5. I find it weird that we're doing `ALL` uploads in the output stream using multipart-uploads. And that includes having `initiateMultipart` call in the output stream's constructor which I personally don't like. Are there any thoughts on improving that by not using multi-part uploads (and rather use plain `PUT` request) for small uploads? Please let me know your thoughts, if that's possible! :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
