liran-funaro commented on a change in pull request #10001:
URL: https://github.com/apache/druid/pull/10001#discussion_r795170564



##########
File path: 
extensions-contrib/oak-incremental-index/src/main/java/org/apache/druid/segment/incremental/oak/OakIncrementalIndexSpec.java
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.incremental.oak;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.segment.incremental.AppendableIndexSpec;
+import org.apache.druid.utils.JvmUtils;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+/**
+ * Oak incremental index spec (describes the in-memory indexing method for 
data ingestion).
+ */
+public class OakIncrementalIndexSpec implements AppendableIndexSpec
+{
+  public static final String TYPE = "oak";
+
+  final long oakMaxMemoryCapacity;
+  final int oakBlockSize;
+  final int oakChunkMaxItems;
+
+  @JsonCreator
+  public OakIncrementalIndexSpec(
+      final @JsonProperty("oakMaxMemoryCapacity") @Nullable Long 
oakMaxMemoryCapacity,
+      final @JsonProperty("oakBlockSize") @Nullable Integer oakBlockSize,
+      final @JsonProperty("oakChunkMaxItems") @Nullable Integer 
oakChunkMaxItems
+  )
+  {
+    this.oakMaxMemoryCapacity = oakMaxMemoryCapacity != null && 
oakMaxMemoryCapacity > 0 ? oakMaxMemoryCapacity :
+        OakIncrementalIndex.Builder.DEFAULT_OAK_MAX_MEMORY_CAPACITY;
+    this.oakBlockSize = oakBlockSize != null && oakBlockSize > 0 ? 
oakBlockSize :
+        OakIncrementalIndex.Builder.DEFAULT_OAK_BLOCK_SIZE;
+    this.oakChunkMaxItems = oakChunkMaxItems != null && oakChunkMaxItems > 0 ? 
oakChunkMaxItems :
+        OakIncrementalIndex.Builder.DEFAULT_OAK_CHUNK_MAX_ITEMS;
+  }
+
+  @JsonProperty
+  public long getOakMaxMemoryCapacity()
+  {
+    return oakMaxMemoryCapacity;
+  }
+
+  @JsonProperty
+  public int getOakBlockSize()
+  {
+    return oakBlockSize;
+  }
+
+  @JsonProperty
+  public int getOakChunkMaxItems()
+  {
+    return oakChunkMaxItems;
+  }
+
+  @Nonnull
+  @Override
+  public OakIncrementalIndex.Builder builder()
+  {
+    return new OakIncrementalIndex.Builder()
+        .setOakMaxMemoryCapacity(oakMaxMemoryCapacity)
+        .setOakBlockSize(oakBlockSize)
+        .setOakChunkMaxItems(oakChunkMaxItems);
+  }
+
+  @Override
+  public long getDefaultMaxBytesInMemory()
+  {
+    // Oak allocates its keys/values directly so the JVM off-heap limitations 
does not apply on it.
+    // Yet, we want to respect these values if the user did not specify any 
specific limitation.
+    // In the realtime node, the entire JVM's direct memory is utilized for 
ingestion and persist operations.
+    // But maxBytesInMemory only refers to the active index size and not to 
the index being flushed to disk and the
+    // persist-buffer.
+    // To account for that, we set default to 1/2 of the max jvm's direct 
memory.
+    return JvmUtils.getRuntimeInfo().getDirectMemorySizeBytes() / 2;

Review comment:
       It's a great question. In fact, I am not sure if the direct memory limit 
applies here at all.
   Oak allocates its memory directly, so it is not subject to any JVM 
limitations (only to the OS memory limit).
   So I'm not sure of a suitable default value.
   An easy solution would be to throw an exception if this value is not set 
manually by the user (`maxBytesInMemory==0`).
   Alternatively, rely on `maxRowsInMemory` to limit the usage and set the 
memory to be unlimited (same as `maxBytesInMemory==-1`).
   I think the best approach is to add an additional middle-manager 
configuration that sets the default memory limit for this extension. This is 
because only the system administrator knows what are the true memory 
limitations of the machine.
   @pjain1 @yurmix @a2l007 What do you think is the best approach here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to