Re: [PR] Optimize S3 storage writing for MSQ durable storage (druid)

via GitHub Tue, 28 May 2024 00:25:30 -0700


rohangarg commented on PR #16481:
URL: https://github.com/apache/druid/pull/16481#issuecomment-2134521306


   > I have put some timing data in the PR description, where you can check 
queries 5 and 6 to get some idea of the degree to which concurrency helps in 
output heavy queries.
   
   Thanks, I checked earlier but I couldn't understand their timings much. For 
instance, were you able to reason about why the serial upload is taking ~ 8 
minutes for 450MBs? Also, the chunk size in those queries seemed a bit 
de-generate to me.
   
   > I don't think earlier each writer got a thread instantly for writing. The 
writing was being done by the processing threadpool, but it's possible to have 
more workers than the processing threadpool size. I haven't tried such a 
scenario so far, but I think under this scenario, not all writers will get a 
thread instantly as there are more workers/writers than the size of processing 
threadpool. To answer your question about fairness in the PR changes, there's 
no such mechanism. But as I described, this seems like a separate problem than 
what this PR is trying to address.
   
   I was actually meaning that the threads which are running and are writing to 
the output stream currently are able to write it to S3 too on the same thread, 
which might not be the case which this change and the writes might have to 
wait. Are we solving that scenario in this change? Further, that brings me to 
another point I wanted to raise - have we considered supporting parallel 
uploads via the execution framework (maybe an operator to fan out writes) 
instead of doing it on a per storage basis? If so, can you please share the 
reasons for not choosing that? One more positive if it can be done via 
execution framework is that it will then automatically support all storage 
connectors.
   
   > That's a good point. I can add logs for wait times of all parts, but I 
felt it might be too much logging. Will think more about this. Happy to hear 
any suggestions as well, if you have any! :)
   
   You can create summary statistics over the wait times and post them in the 
logs - hopefully that's good enough :) 
   
   > Semaphore was used to make sure we restrict number of files locally to xyz 
number of files, to restrict disk space usage at any given point in time. 
Semaphore wasn't needed in the original code since the disk space was 
automatically restricted as the same thread was deleting the chunk before 
writing the next chunk.
   
   Yes, it makes sense to restrict the number of concurrent files. But I would 
do that in the executor related code itself instead of leaking it into the 
output stream.
   
   > I think the rationale here is that we don't know in advance how big the 
overall file would be. Hence we are doing multi-part uploads for everything.
   
   You can determine that the file is small when not even one local chunk is 
fully filled, and the output stream is closed. And once you've determined that 
a file is small, direct `PUT` can be used over multi-part upload.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize S3 storage writing for MSQ durable storage (druid)

Reply via email to