Re: [PR] Optimize S3 storage writing for MSQ durable storage (druid)

via GitHub Mon, 27 May 2024 22:13:49 -0700


Akshat-Jain commented on code in PR #16481:
URL: https://github.com/apache/druid/pull/16481#discussion_r1616592364



##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/RetryableS3OutputStream.java:
##########
@@ -199,15 +211,47 @@ private void pushCurrentChunk() throws IOException
   {
     currentChunk.close();
     final Chunk chunk = currentChunk;
-    try {
-      if (chunk.length() > 0) {
-        resultsSize += chunk.length();
+    if (chunk.length() > 0) {
+      try {
+        SEMAPHORE.acquire(); // Acquire a permit from the semaphore

Review Comment:
   @kfaraz I had a discussion with @cryptoe yesterday about this approach.
   While it would theoretically be possible, this approach will get very 
complicated and error prone to manage considering how our code is structured, 
since there are way too many edge scenarios that can go wrong.
   
   `write()` method is called concurrently by multiple threads, and we run into 
edge scenarios like:
   ```
   say chunk size is 5 MB, and disk space available is 10 MB.
   
   thread 1 has written 3 MB
   thread 2 has written 3 MB
   thread 3 has written 3 MB
   
   Deadlock!!! --- since each thread has written enough data to reach the disk 
space threshold, but not enough data to reach the chunk size threshold, and 
hence none of them trigger an upload(), and hence all of them are stuck in 
deadlock.
   ```
   
   Based on yesterday's discussion with Karan, we concluded that we can go with 
`number of chunks` criteria instead of bytes stored, and hence I've implemented 
that approach via S3UploadConfig class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize S3 storage writing for MSQ durable storage (druid)

Reply via email to