rohangarg commented on PR #16481:
URL: https://github.com/apache/druid/pull/16481#issuecomment-2133077298

   @Akshat-Jain : I had some doubts regarding the changes : 
   1. Is there an example of how much time does the S3 writes take in real 
jobs? That can include the fraction of the overall time being consumed as well. 
Also, maybe that varies for small jobs vs large jobs and the concurrency in the 
system.
   2. I see that parallelism is introduced via a thread-pool in the system. 
Does that pool ensure some fairness for each writer? Earlier each writer used 
to get atleast one thread instantly for writing - what are your thoughts on 
that requirement?
   3. Also, adding things arbitrarily in thread-pool might mean that the some 
parts for a writer may have to wait to be uploaded. Is there any mechanism to 
know a summary of the wait times of the parts? I am asking this since I'm not 
sure how would a person evaluate the performance of a write in a concurrent 
system.
   4. I see some thread safe things being done with semaphores in the output 
stream - is the output stream expected to be thread safe? Or is that done for 
coordination with the executor among multiple writers. If it for coordination, 
then should that code reside inside the thread-pool executor somehow? That is 
also attached to the backpressure mechanism being built per-writer.
   5. I find it weird that we're doing `ALL` uploads in the output stream using 
multipart-uploads. And that includes having `initiateMultipart` call in the 
output stream's constructor which I personally don't like. Are there any 
thoughts on improving that by not using multi-part uploads (and rather use 
plain `PUT` request) for small uploads? 
   
   
   Please let me know your thoughts, if that's possible! :+1:  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to