[GitHub] [druid] paul-rogers commented on a diff in pull request #13707: Add maxInputBytesPerWorker as query context parameter

via GitHub Thu, 26 Jan 2023 15:19:20 -0800


paul-rogers commented on code in PR #13707:
URL: https://github.com/apache/druid/pull/13707#discussion_r1088434276



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/kernel/StageDefinition.java:
##########
@@ -122,6 +124,7 @@
     this.maxWorkerCount = maxWorkerCount;
     this.shuffleCheckHasMultipleValues = shuffleCheckHasMultipleValues;
     this.frameReader = Suppliers.memoize(() -> 
FrameReader.create(signature))::get;
+    this.maxInputBytesPerWorker = maxInputBytesPerWorker;

Review Comment:
   The stage definition is sent over the wire. In a rolling upgrade, what 
happens if an older controller sends a task without this value? Should we use a 
`Long` and use the default if the value is null (as it will be if the worker 
task is missing this field)?



##########
docs/multi-stage-query/reference.md:
##########
@@ -602,6 +602,7 @@ The following table lists the context parameters for the 
MSQ task engine:
 | `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on 
fault tolerance mode or not. Failed workers are retried based on 
[Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly 
set to false.  | `false` |
 | `composedIntermediateSuperSorterStorageEnabled` | SELECT, INSERT, REPLACE<br 
/><br /> Whether to enable automatic fallback to durable storage from local 
storage for sorting's intermediate data. Requires to setup 
`intermediateSuperSorterStorageMaxLocalBytes` limit for local storage and 
durable shuffle storage feature as well.| `false` |
 | `intermediateSuperSorterStorageMaxLocalBytes` | SELECT, INSERT, REPLACE<br 
/><br /> Whether to enable a byte limit on local storage for sorting's 
intermediate data. If that limit is crossed, the task fails with 
`ResourceLimitExceededException`.| `9223372036854775807` |
+| `maxInputBytesPerWorker` | Maximum number of input bytes per worker, used 
when assigning input slices to tasks. Used only in case number of tasks is 
determined automatically. | `10 GB` |

Review Comment:
   Can we improve this? What does this limit? Us casual readers can't figure 
out what "input bytes" means. Input files? Input frames? Something else?
   
   Is this a limit on total input bytes? Probably not. On buffers used to store 
a chunk of input? On the length of the longest input row? What problem occurs 
if this is too small?
   
   Why would I want to reduce this? 10 GB is pretty big. Is this a memory 
limit? Seems big. Or, is it the largest input frame aggregate size that the 
worker will accept? As you can see, I'm guessing: please explain the actual 
meaning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on a diff in pull request #13707: Add maxInputBytesPerWorker as query context parameter

Reply via email to