Re: [PR] Compute broadcast-join segmentMapFn only once per worker. (druid)

via GitHub Fri, 29 Sep 2023 01:08:50 -0700


gianm commented on code in PR #15007:
URL: https://github.com/apache/druid/pull/15007#discussion_r1341048323



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerMemoryParameters.java:
##########
@@ -130,11 +131,11 @@ public class WorkerMemoryParameters
   private static final long SMALL_WORKER_CAPACITY_THRESHOLD_BYTES = 
256_000_000;
 
   /**
-   * Fraction of free memory per bundle that can be used by {@link 
org.apache.druid.msq.querykit.BroadcastJoinHelper}
-   * to store broadcast data on-heap. This is used to limit the total size of 
input frames, which we expect to
-   * expand on-heap. Expansion can potentially be somewhat over 2x: for 
example, strings are UTF-8 in frames, but are
-   * UTF-16 on-heap, which is a 2x expansion, and object and index overhead 
must be considered on top of that. So
-   * we use a value somewhat lower than 0.5.
+   * Fraction of free memory per bundle that can be used by {@link 
BroadcastJoinSegmentMapFnProcessor} to store broadcast
+   * data on-heap. This is used to limit the total size of input frames, which 
we expect to expand on-heap. Expansion
+   * can potentially be somewhat over 2x: for example, strings are UTF-8 in 
frames, but are UTF-16 on-heap, which is
+   * a 2x expansion, and object and index overhead must be considered on top 
of that. So we use a value somewhat
+   * lower than 0.5.
    */
   static final double BROADCAST_JOIN_MEMORY_FRACTION = 0.3;

Review Comment:
   This makes sense in principle, although I want to hold off on it, because 
right now the worker bundle is mostly taken up by items associated with the 
shuffle. Moving this to the worker bundle would crowd that out.
   
   I have been thinking generally of increasing the size of the worker bundle 
and decreasing the size of the processing bundle, because the processors get to 
use processing buffers, which are pretty big. There isn't much need for much 
extra memory beyond that, so it is safe to shrink the processing bundles. The 
main exception is `segmentGenerator`, which uses the processing bundle to build 
and persist segments. But we can have that use worker-bundle memory by limiting 
the # of segments build concurrently per task to 1. Which is effectively 
happening anyway for tasks.
   
   Anyway, point is, I agree with you but I think it's not quite the right time 
🙂



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerMemoryParameters.java:
##########
@@ -130,11 +131,11 @@ public class WorkerMemoryParameters
   private static final long SMALL_WORKER_CAPACITY_THRESHOLD_BYTES = 
256_000_000;
 
   /**
-   * Fraction of free memory per bundle that can be used by {@link 
org.apache.druid.msq.querykit.BroadcastJoinHelper}
-   * to store broadcast data on-heap. This is used to limit the total size of 
input frames, which we expect to
-   * expand on-heap. Expansion can potentially be somewhat over 2x: for 
example, strings are UTF-8 in frames, but are
-   * UTF-16 on-heap, which is a 2x expansion, and object and index overhead 
must be considered on top of that. So
-   * we use a value somewhat lower than 0.5.
+   * Fraction of free memory per bundle that can be used by {@link 
BroadcastJoinSegmentMapFnProcessor} to store broadcast
+   * data on-heap. This is used to limit the total size of input frames, which 
we expect to expand on-heap. Expansion
+   * can potentially be somewhat over 2x: for example, strings are UTF-8 in 
frames, but are UTF-16 on-heap, which is
+   * a 2x expansion, and object and index overhead must be considered on top 
of that. So we use a value somewhat
+   * lower than 0.5.
    */
   static final double BROADCAST_JOIN_MEMORY_FRACTION = 0.3;

Review Comment:
   This makes sense in principle, although I want to hold off on it, because 
right now the worker bundle is mostly taken up by items associated with the 
shuffle. Moving this to the worker bundle would crowd that out.
   
   I have been thinking generally of increasing the size of the worker bundle 
and decreasing the size of the processing bundle, because the processors get to 
use processing buffers, which are pretty big. There isn't much need for much 
extra memory beyond that, so it is safe to shrink the processing bundles. The 
main exception is `segmentGenerator`, which uses the processing bundle to build 
and persist segments. But we can switch that to the worker bundle by limiting 
the # of segments build concurrently per task to 1. Which is effectively 
happening anyway for tasks.
   
   Anyway, point is, I agree with you but I think it's not quite the right time 
🙂



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Compute broadcast-join segmentMapFn only once per worker. (druid)

Reply via email to