Re: [PR] MSQ controller: Support in-memory shuffles; towards JVM reuse. (druid)

via GitHub Sun, 28 Apr 2024 22:02:20 -0700


LakshSingla commented on code in PR #16168:
URL: https://github.com/apache/druid/pull/16168#discussion_r1573215860



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MemoryIntrospector.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.exec;
+
+import org.apache.druid.msq.kernel.WorkOrder;
+
+/**
+ * Introspector used to generate {@link ControllerMemoryParameters}.
+ */
+public interface MemoryIntrospector
+{
+  /**
+   * Amount of total memory in the entire JVM.
+   */
+  long totalMemoryInJvm();
+
+  /**
+   * Amount of memory usable for the multi-stage query engine in the entire 
JVM.
+   *
+   * This may be an expensive operation. For example, the production 
implementation {@link MemoryIntrospectorImpl}
+   * estimates size of all lookups as part of computing this value.
+   */
+  long usableMemoryInJvm();
+
+  /**
+   * Amount of total JVM memory required for a particular amount of usable 
memory to be available.
+   *
+   * This may be an expensive operation. For example, the production 
implementation {@link MemoryIntrospectorImpl}
+   * estimates size of all lookups as part of computing this value.
+   */
+  long jvmMemoryRequiredForUsableMemory(long usableMemory);
+
+  /**
+   * Maximum number of queries that run simultaneously in this JVM.
+   *
+   * On workers, this is the maximum number of {@link Worker} that run 
simultaneously. See
+   * {@link WorkerMemoryParameters} for how memory is divided among and within 
{@link WorkOrder} run by a worker.

Review Comment:
   It seems slightly off - a single query can be run simultaneously on multiple 
tasks. `numQueriesInJvm` should technically be 1, but by the definition in the 
Javadoc, it might return > 1



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/guice/PeonMemoryManagementModule.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.guice;
+
+import com.google.inject.Binder;
+import com.google.inject.Provides;
+import org.apache.druid.discovery.NodeRole;
+import org.apache.druid.frame.processor.Bouncer;
+import org.apache.druid.guice.LazySingleton;
+import org.apache.druid.guice.annotations.LoadScope;
+import org.apache.druid.initialization.DruidModule;
+import org.apache.druid.msq.exec.MemoryIntrospector;
+import org.apache.druid.msq.exec.MemoryIntrospectorImpl;
+import org.apache.druid.query.lookup.LookupExtractorFactoryContainerProvider;
+import org.apache.druid.utils.JvmUtils;
+
+/**
+ * Provides {@link MemoryIntrospector} for single-task-per-JVM model.
+ *
+ * @see IndexerMemoryManagementModule for multi-task-per-JVM model used on 
{@link org.apache.druid.cli.CliIndexer}
+ */
+@LoadScope(roles = NodeRole.PEON_JSON_NAME)
+public class PeonMemoryManagementModule implements DruidModule
+{
+  /**
+   * Peons have a single worker per JVM.
+   */
+  private static final int NUM_WORKERS_IN_JVM = 1;
+
+  /**
+   * Peons may have more than one processing thread, but we currently only use 
one of them.
+   */
+  private static final int NUM_PROCESSING_THREADS = 1;

Review Comment:
   I think this hasn't changed from the original MSQ design, but is there any 
reason why we use a single thread in the Peon? Why are the other threads not 
utilized? 
   
   Also, in the indexer, we use all the threads, but as per my reasoning, we 
should use a single thread, per process in the indexer (since other processes 
share that). What's the fallacy in my thinking? 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/guice/IndexerMemoryManagementModule.java:
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.guice;
+
+import com.google.inject.Binder;
+import com.google.inject.Provides;
+import org.apache.druid.discovery.NodeRole;
+import org.apache.druid.frame.processor.Bouncer;
+import org.apache.druid.guice.LazySingleton;
+import org.apache.druid.guice.annotations.LoadScope;
+import org.apache.druid.indexing.worker.config.WorkerConfig;
+import org.apache.druid.initialization.DruidModule;
+import org.apache.druid.msq.exec.MemoryIntrospector;
+import org.apache.druid.msq.exec.MemoryIntrospectorImpl;
+import org.apache.druid.query.DruidProcessingConfig;
+import org.apache.druid.query.lookup.LookupExtractorFactoryContainerProvider;
+import org.apache.druid.utils.JvmUtils;
+
+/**
+ * Provides {@link MemoryIntrospector} for multi-task-per-JVM model.
+ *
+ * @see PeonMemoryManagementModule for single-task-per-JVM model used on 
{@link org.apache.druid.cli.CliPeon}
+ */
+@LoadScope(roles = NodeRole.INDEXER_JSON_NAME)
+public class IndexerMemoryManagementModule implements DruidModule
+{
+  /**
+   * Allocate up to 75% of memory for MSQ-related stuff (if all running tasks 
are MSQ tasks).
+   */
+  private static final double USABLE_MEMORY_FRACTION = 0.75;
+
+  @Override
+  public void configure(Binder binder)
+  {
+    // Nothing to do.
+  }
+
+  @Provides
+  @LazySingleton

Review Comment:
   Since it's a singleton, we should probably rename the method to what 
resource this bouncer guards. Else there can be confusion as to whether we want 
to use the global bouncer or create one locally. 
   Digging into the code, it seems to be guarding the number of the processors 
that can run concurrently, so something like `makeProcessorBouncer` seems 
non-ambiguous.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] MSQ controller: Support in-memory shuffles; towards JVM reuse. (druid)

Reply via email to