paul-rogers commented on code in PR #13707:
URL: https://github.com/apache/druid/pull/13707#discussion_r1088434276
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/kernel/StageDefinition.java:
##########
@@ -122,6 +124,7 @@
this.maxWorkerCount = maxWorkerCount;
this.shuffleCheckHasMultipleValues = shuffleCheckHasMultipleValues;
this.frameReader = Suppliers.memoize(() ->
FrameReader.create(signature))::get;
+ this.maxInputBytesPerWorker = maxInputBytesPerWorker;
Review Comment:
The stage definition is sent over the wire. In a rolling upgrade, what
happens if an older controller sends a task without this value? Should we use a
`Long` and use the default if the value is null (as it will be if the worker
task is missing this field)?
##########
docs/multi-stage-query/reference.md:
##########
@@ -602,6 +602,7 @@ The following table lists the context parameters for the
MSQ task engine:
| `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on
fault tolerance mode or not. Failed workers are retried based on
[Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly
set to false. | `false` |
| `composedIntermediateSuperSorterStorageEnabled` | SELECT, INSERT, REPLACE<br
/><br /> Whether to enable automatic fallback to durable storage from local
storage for sorting's intermediate data. Requires to setup
`intermediateSuperSorterStorageMaxLocalBytes` limit for local storage and
durable shuffle storage feature as well.| `false` |
| `intermediateSuperSorterStorageMaxLocalBytes` | SELECT, INSERT, REPLACE<br
/><br /> Whether to enable a byte limit on local storage for sorting's
intermediate data. If that limit is crossed, the task fails with
`ResourceLimitExceededException`.| `9223372036854775807` |
+| `maxInputBytesPerWorker` | Maximum number of input bytes per worker, used
when assigning input slices to tasks. Used only in case number of tasks is
determined automatically. | `10 GB` |
Review Comment:
Can we improve this? What does this limit? Us casual readers can't figure
out what "input bytes" means. Input files? Input frames? Something else?
Is this a limit on total input bytes? Probably not. On buffers used to store
a chunk of input? On the length of the longest input row? What problem occurs
if this is too small?
Why would I want to reduce this? 10 GB is pretty big. Is this a memory
limit? Seems big. Or, is it the largest input frame aggregate size that the
worker will accept? As you can see, I'm guessing: please explain the actual
meaning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]