Re: [PR] Optimize S3 storage writing for MSQ durable storage (druid)

via GitHub Mon, 27 May 2024 04:42:55 -0700


Akshat-Jain commented on PR #16481:
URL: https://github.com/apache/druid/pull/16481#issuecomment-2133300113


   > @Akshat-Jain : I had some doubts regarding the changes :
   > 
   > 1. Is there an example of how much time does the S3 writes take in real 
jobs? That can include the fraction of the overall time being consumed as well. 
Also, maybe that varies for small jobs vs large jobs and the concurrency in the 
system.
   > 2. I see that parallelism is introduced via a thread-pool in the system. 
Does that pool ensure some fairness for each writer? Earlier each writer used 
to get atleast one thread instantly for writing - what are your thoughts on 
that requirement?
   > 3. Also, adding things arbitrarily in thread-pool might mean that the some 
parts for a writer may have to wait to be uploaded. Is there any mechanism to 
know a summary of the wait times of the parts? I am asking this since I'm not 
sure how would a person evaluate the performance of a write in a concurrent 
system.
   > 4. I see some thread safe things being done with semaphores in the output 
stream - is the output stream expected to be thread safe? Or is that done for 
coordination with the executor among multiple writers. If it for coordination, 
then should that code reside inside the thread-pool executor somehow? That is 
also attached to the backpressure mechanism being built per-writer.
   > 5. I find it weird that we're doing `ALL` uploads in the output stream 
using multipart-uploads. And that includes having `initiateMultipart` call in 
the output stream's constructor which I personally don't like. Are there any 
thoughts on improving that by not using multi-part uploads (and rather use 
plain `PUT` request) for small uploads?
   > 
   > Please let me know your thoughts, if that's possible! 👍
   
   @rohangarg Thanks for the detailed questions!
   
   1. I have put some timing data in the PR description, where you can check 
queries 5 and 6 to get some idea of the degree to which concurrency helps in 
output heavy queries. With that said, I can work on getting timing data for 
queries across these scenarios, which should help us deduce how much time was 
taken in the durable storage part of the overall thing:
       1. MSQ without this PR's code changes + durable storage disabled
       2. MSQ without this PR's code changes + durable storage enabled
       3. MSQ with this PR's code changes + durable storage enabled
   2. I don't think earlier each writer got a thread instantly for writing. The 
writing was being done by the processing threadpool, but it's possible to have 
more workers than the processing threadpool size. I haven't tried such a 
scenario so far, but I think under this scenario, not all writers will get a 
thread instantly as there are more workers/writers than the size of processing 
threadpool. To answer your question about fairness in the PR changes, there's 
no such mechanism. But as I described, this seems like a separate problem than 
what this PR is trying to address.
   3. That's a good point. I can add logs for wait times of all parts, but I 
felt it might be too much logging. Will think more about this. Happy to hear 
any suggestions as well, if you have any! :)
   4. Semaphore was used to make sure we restrict number of files locally to 
xyz number of files, to restrict disk space usage at any given point in time. 
Semaphore wasn't needed in the original code since the disk space was 
automatically restricted as the same thread was deleting the chunk before 
writing the next chunk.
   5. I think the rationale here is that we don't know in advance how big the 
overall file would be. Hence we are doing multi-part uploads for everything. 
cc: @cryptoe to confirm this point though.
   
   Hope these answer your questions. Happy to discuss further on any of the 
above points if needed, thanks! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize S3 storage writing for MSQ durable storage (druid)

Reply via email to