gianm commented on code in PR #15007:
URL: https://github.com/apache/druid/pull/15007#discussion_r1341048323
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerMemoryParameters.java:
##########
@@ -130,11 +131,11 @@ public class WorkerMemoryParameters
private static final long SMALL_WORKER_CAPACITY_THRESHOLD_BYTES =
256_000_000;
/**
- * Fraction of free memory per bundle that can be used by {@link
org.apache.druid.msq.querykit.BroadcastJoinHelper}
- * to store broadcast data on-heap. This is used to limit the total size of
input frames, which we expect to
- * expand on-heap. Expansion can potentially be somewhat over 2x: for
example, strings are UTF-8 in frames, but are
- * UTF-16 on-heap, which is a 2x expansion, and object and index overhead
must be considered on top of that. So
- * we use a value somewhat lower than 0.5.
+ * Fraction of free memory per bundle that can be used by {@link
BroadcastJoinSegmentMapFnProcessor} to store broadcast
+ * data on-heap. This is used to limit the total size of input frames, which
we expect to expand on-heap. Expansion
+ * can potentially be somewhat over 2x: for example, strings are UTF-8 in
frames, but are UTF-16 on-heap, which is
+ * a 2x expansion, and object and index overhead must be considered on top
of that. So we use a value somewhat
+ * lower than 0.5.
*/
static final double BROADCAST_JOIN_MEMORY_FRACTION = 0.3;
Review Comment:
This makes sense in principle, although I want to hold off on it, because
right now the worker bundle is mostly taken up by items associated with the
shuffle. Moving this to the worker bundle would crowd that out.
I have been thinking generally of increasing the size of the worker bundle
and decreasing the size of the processing bundle, because the processors get to
use processing buffers, which are pretty big. There isn't much need for much
extra memory beyond that, so it is safe to shrink the processing bundles. The
main exception is `segmentGenerator`, which uses the processing bundle to build
and persist segments. But we can have that use worker-bundle memory by limiting
the # of segments build concurrently per task to 1. Which is effectively
happening anyway for tasks.
Anyway, point is, I agree with you but I think it's not quite the right time
🙂
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerMemoryParameters.java:
##########
@@ -130,11 +131,11 @@ public class WorkerMemoryParameters
private static final long SMALL_WORKER_CAPACITY_THRESHOLD_BYTES =
256_000_000;
/**
- * Fraction of free memory per bundle that can be used by {@link
org.apache.druid.msq.querykit.BroadcastJoinHelper}
- * to store broadcast data on-heap. This is used to limit the total size of
input frames, which we expect to
- * expand on-heap. Expansion can potentially be somewhat over 2x: for
example, strings are UTF-8 in frames, but are
- * UTF-16 on-heap, which is a 2x expansion, and object and index overhead
must be considered on top of that. So
- * we use a value somewhat lower than 0.5.
+ * Fraction of free memory per bundle that can be used by {@link
BroadcastJoinSegmentMapFnProcessor} to store broadcast
+ * data on-heap. This is used to limit the total size of input frames, which
we expect to expand on-heap. Expansion
+ * can potentially be somewhat over 2x: for example, strings are UTF-8 in
frames, but are UTF-16 on-heap, which is
+ * a 2x expansion, and object and index overhead must be considered on top
of that. So we use a value somewhat
+ * lower than 0.5.
*/
static final double BROADCAST_JOIN_MEMORY_FRACTION = 0.3;
Review Comment:
This makes sense in principle, although I want to hold off on it, because
right now the worker bundle is mostly taken up by items associated with the
shuffle. Moving this to the worker bundle would crowd that out.
I have been thinking generally of increasing the size of the worker bundle
and decreasing the size of the processing bundle, because the processors get to
use processing buffers, which are pretty big. There isn't much need for much
extra memory beyond that, so it is safe to shrink the processing bundles. The
main exception is `segmentGenerator`, which uses the processing bundle to build
and persist segments. But we can switch that to the worker bundle by limiting
the # of segments build concurrently per task to 1. Which is effectively
happening anyway for tasks.
Anyway, point is, I agree with you but I think it's not quite the right time
🙂
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]