Akshat-Jain commented on code in PR #16481:
URL: https://github.com/apache/druid/pull/16481#discussion_r1616592364
##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/RetryableS3OutputStream.java:
##########
@@ -199,15 +211,47 @@ private void pushCurrentChunk() throws IOException
{
currentChunk.close();
final Chunk chunk = currentChunk;
- try {
- if (chunk.length() > 0) {
- resultsSize += chunk.length();
+ if (chunk.length() > 0) {
+ try {
+ SEMAPHORE.acquire(); // Acquire a permit from the semaphore
Review Comment:
@kfaraz I had a discussion with @cryptoe yesterday about this approach.
While it would theoretically be possible, this approach will get very
complicated and error prone to manage considering how our code is structured,
since there are way too many edge scenarios that can go wrong.
`write()` method is called concurrently by multiple threads, and we run into
edge scenarios like:
```
say chunk size is 5 MB, and disk space available is 10 MB.
thread 1 has written 3 MB
thread 2 has written 3 MB
thread 3 has written 3 MB
Deadlock!!! --- since each thread has written enough data to reach the disk
space threshold, but not enough data to reach the chunk size threshold, and
hence none of them trigger an upload(), and hence all of them are stuck in
deadlock.
```
Based on yesterday's discussion with Karan, we concluded that we can go with
`number of chunks` criteria instead of bytes stored, and hence I've implemented
that approach via S3UploadConfig class.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]