LakshSingla commented on code in PR #16168: URL: https://github.com/apache/druid/pull/16168#discussion_r1573215860
########## extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MemoryIntrospector.java: ########## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.msq.exec; + +import org.apache.druid.msq.kernel.WorkOrder; + +/** + * Introspector used to generate {@link ControllerMemoryParameters}. + */ +public interface MemoryIntrospector +{ + /** + * Amount of total memory in the entire JVM. + */ + long totalMemoryInJvm(); + + /** + * Amount of memory usable for the multi-stage query engine in the entire JVM. + * + * This may be an expensive operation. For example, the production implementation {@link MemoryIntrospectorImpl} + * estimates size of all lookups as part of computing this value. + */ + long usableMemoryInJvm(); + + /** + * Amount of total JVM memory required for a particular amount of usable memory to be available. + * + * This may be an expensive operation. For example, the production implementation {@link MemoryIntrospectorImpl} + * estimates size of all lookups as part of computing this value. + */ + long jvmMemoryRequiredForUsableMemory(long usableMemory); + + /** + * Maximum number of queries that run simultaneously in this JVM. + * + * On workers, this is the maximum number of {@link Worker} that run simultaneously. See + * {@link WorkerMemoryParameters} for how memory is divided among and within {@link WorkOrder} run by a worker. Review Comment: It seems slightly off - a single query can be run simultaneously on multiple tasks. `numQueriesInJvm` should technically be 1, but by the definition in the Javadoc, it might return > 1 ########## extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/guice/PeonMemoryManagementModule.java: ########## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.msq.guice; + +import com.google.inject.Binder; +import com.google.inject.Provides; +import org.apache.druid.discovery.NodeRole; +import org.apache.druid.frame.processor.Bouncer; +import org.apache.druid.guice.LazySingleton; +import org.apache.druid.guice.annotations.LoadScope; +import org.apache.druid.initialization.DruidModule; +import org.apache.druid.msq.exec.MemoryIntrospector; +import org.apache.druid.msq.exec.MemoryIntrospectorImpl; +import org.apache.druid.query.lookup.LookupExtractorFactoryContainerProvider; +import org.apache.druid.utils.JvmUtils; + +/** + * Provides {@link MemoryIntrospector} for single-task-per-JVM model. + * + * @see IndexerMemoryManagementModule for multi-task-per-JVM model used on {@link org.apache.druid.cli.CliIndexer} + */ +@LoadScope(roles = NodeRole.PEON_JSON_NAME) +public class PeonMemoryManagementModule implements DruidModule +{ + /** + * Peons have a single worker per JVM. + */ + private static final int NUM_WORKERS_IN_JVM = 1; + + /** + * Peons may have more than one processing thread, but we currently only use one of them. + */ + private static final int NUM_PROCESSING_THREADS = 1; Review Comment: I think this hasn't changed from the original MSQ design, but is there any reason why we use a single thread in the Peon? Why are the other threads not utilized? Also, in the indexer, we use all the threads, but as per my reasoning, we should use a single thread, per process in the indexer (since other processes share that). What's the fallacy in my thinking? ########## extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/guice/IndexerMemoryManagementModule.java: ########## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.msq.guice; + +import com.google.inject.Binder; +import com.google.inject.Provides; +import org.apache.druid.discovery.NodeRole; +import org.apache.druid.frame.processor.Bouncer; +import org.apache.druid.guice.LazySingleton; +import org.apache.druid.guice.annotations.LoadScope; +import org.apache.druid.indexing.worker.config.WorkerConfig; +import org.apache.druid.initialization.DruidModule; +import org.apache.druid.msq.exec.MemoryIntrospector; +import org.apache.druid.msq.exec.MemoryIntrospectorImpl; +import org.apache.druid.query.DruidProcessingConfig; +import org.apache.druid.query.lookup.LookupExtractorFactoryContainerProvider; +import org.apache.druid.utils.JvmUtils; + +/** + * Provides {@link MemoryIntrospector} for multi-task-per-JVM model. + * + * @see PeonMemoryManagementModule for single-task-per-JVM model used on {@link org.apache.druid.cli.CliPeon} + */ +@LoadScope(roles = NodeRole.INDEXER_JSON_NAME) +public class IndexerMemoryManagementModule implements DruidModule +{ + /** + * Allocate up to 75% of memory for MSQ-related stuff (if all running tasks are MSQ tasks). + */ + private static final double USABLE_MEMORY_FRACTION = 0.75; + + @Override + public void configure(Binder binder) + { + // Nothing to do. + } + + @Provides + @LazySingleton Review Comment: Since it's a singleton, we should probably rename the method to what resource this bouncer guards. Else there can be confusion as to whether we want to use the global bouncer or create one locally. Digging into the code, it seems to be guarding the number of the processors that can run concurrently, so something like `makeProcessorBouncer` seems non-ambiguous. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
