Akshat-Jain commented on PR #16481:
URL: https://github.com/apache/druid/pull/16481#issuecomment-2133300113
> @Akshat-Jain : I had some doubts regarding the changes :
>
> 1. Is there an example of how much time does the S3 writes take in real
jobs? That can include the fraction of the overall time being consumed as well.
Also, maybe that varies for small jobs vs large jobs and the concurrency in the
system.
> 2. I see that parallelism is introduced via a thread-pool in the system.
Does that pool ensure some fairness for each writer? Earlier each writer used
to get atleast one thread instantly for writing - what are your thoughts on
that requirement?
> 3. Also, adding things arbitrarily in thread-pool might mean that the some
parts for a writer may have to wait to be uploaded. Is there any mechanism to
know a summary of the wait times of the parts? I am asking this since I'm not
sure how would a person evaluate the performance of a write in a concurrent
system.
> 4. I see some thread safe things being done with semaphores in the output
stream - is the output stream expected to be thread safe? Or is that done for
coordination with the executor among multiple writers. If it for coordination,
then should that code reside inside the thread-pool executor somehow? That is
also attached to the backpressure mechanism being built per-writer.
> 5. I find it weird that we're doing `ALL` uploads in the output stream
using multipart-uploads. And that includes having `initiateMultipart` call in
the output stream's constructor which I personally don't like. Are there any
thoughts on improving that by not using multi-part uploads (and rather use
plain `PUT` request) for small uploads?
>
> Please let me know your thoughts, if that's possible! 👍
@rohangarg Thanks for the detailed questions!
1. I have put some timing data in the PR description, where you can check
queries 5 and 6 to get some idea of the degree to which concurrency helps in
output heavy queries. With that said, I can work on getting timing data for
queries across these scenarios, which should help us deduce how much time was
taken in the durable storage part of the overall thing:
1. MSQ without this PR's code changes + durable storage disabled
2. MSQ without this PR's code changes + durable storage enabled
3. MSQ with this PR's code changes + durable storage enabled
2. I don't think earlier each writer got a thread instantly for writing. The
writing was being done by the processing threadpool, but it's possible to have
more workers than the processing threadpool size. I haven't tried such a
scenario so far, but I think under this scenario, not all writers will get a
thread instantly as there are more workers/writers than the size of processing
threadpool. To answer your question about fairness in the PR changes, there's
no such mechanism. But as I described, this seems like a separate problem than
what this PR is trying to address.
3. That's a good point. I can add logs for wait times of all parts, but I
felt it might be too much logging. Will think more about this. Happy to hear
any suggestions as well, if you have any! :)
4. Semaphore was used to make sure we restrict number of files locally to
xyz number of files, to restrict disk space usage at any given point in time.
Semaphore wasn't needed in the original code since the disk space was
automatically restricted as the same thread was deleting the chunk before
writing the next chunk.
5. I think the rationale here is that we don't know in advance how big the
overall file would be. Hence we are doing multi-part uploads for everything.
cc: @cryptoe to confirm this point though.
Hope these answer your questions. Happy to discuss further on any of the
above points if needed, thanks! :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]