yashmayya commented on code in PR #13164:
URL: https://github.com/apache/pinot/pull/13164#discussion_r1609726180


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java:
##########
@@ -411,4 +407,18 @@ public String[][] transformToStringValuesMV(ValueBlock 
valueBlock) {
     }
     return _stringValuesMV;
   }
+
+  /**
+   * Lazily initialize _valueToMatchingDocsMap, so that map generation is 
skipped when filtering excludes all values
+   */
+  private Map<String, RoaringBitmap> getValueToMatchingDocsMap() {

Review Comment:
   Just for my understanding, this isn't referring to the filtering done by the 
`JsonExtractIndexTransformFunction` itself right (because that's only done 
after the map is generated in the `JsonIndexReader`)? So is this referring to 
query level filters that might result in none of this transform function's 
`transformToXyz` methods being called (and I presume the transform function's 
`init` method is still called beforehand)?



##########
pinot-core/src/test/java/org/apache/pinot/core/accounting/ResourceManagerAccountingTest.java:
##########
@@ -368,6 +379,105 @@ public void testGetDataTableOOMGroupBy()
     Assert.assertTrue(earlyTerminationOccurred.get());
   }
 
+  /**
+   * Test instrumentation in getMatchingFlattenedDocsMap() from
+   * {@link org.apache.pinot.segment.spi.index.reader.JsonIndexReader}
+   *
+   * Since getMatchingFlattenedDocsMap() can collect a large map before 
processing any blocks, it is required to
+   * check for OOM during map generation. This test generates a mutable and 
immutable json index, and generates a map
+   * as would happen in json_extract_index execution.
+   *
+   * It is roughly equivalent to running json_extract_index(col, '$.key', 
'STRING').
+   */
+  @Test
+  public void testJsonIndexExtractMapOOM()
+      throws Exception {
+    HashMap<String, Object> configs = new HashMap<>();
+    ServerMetrics.register(Mockito.mock(ServerMetrics.class));
+    ThreadResourceUsageProvider.setThreadMemoryMeasurementEnabled(true);
+    
LogManager.getLogger(PerQueryCPUMemResourceUsageAccountant.class).setLevel(Level.OFF);
+    
LogManager.getLogger(ThreadResourceUsageProvider.class).setLevel(Level.OFF);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_ALARMING_LEVEL_HEAP_USAGE_RATIO,
 0.00f);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_CRITICAL_LEVEL_HEAP_USAGE_RATIO,
 0.00f);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME,
+        "org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory");
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_MEMORY_SAMPLING, 
true);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_CPU_SAMPLING, 
false);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_OOM_PROTECTION_KILLING_QUERY, 
true);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_MIN_MEMORY_FOOTPRINT_TO_KILL_RATIO,
 0.00f);
+
+    PinotConfiguration config = getConfig(2, 2, configs);
+    ResourceManager rm = getResourceManager(2, 2, 1, 1, configs);
+    // init accountant and start watcher task
+    Tracing.ThreadAccountantOps.initializeThreadAccountant(config, 
"testJsonIndexExtractMapOOM");
+
+    Supplier<String> randomJsonValue = () -> {
+      Random random = new Random();
+      int length = random.nextInt(1000);
+      StringBuilder sb = new StringBuilder();
+      for (int i = 0; i < length; i++) {
+        sb.append((char) (random.nextInt(26) + 'a'));
+      }
+      return "{\"key\":\"" + sb + "\"}";
+    };
+
+    File indexDir = new File(FileUtils.getTempDirectory(), 
"testJsonIndexExtractMapOOM");
+    FileUtils.forceMkdir(indexDir);
+    String colName = "col";
+    try (JsonIndexCreator offHeapIndexCreator = new 
OffHeapJsonIndexCreator(indexDir, colName, new JsonIndexConfig());
+        MutableJsonIndexImpl mutableJsonIndex = new MutableJsonIndexImpl(new 
JsonIndexConfig())) {
+      // build json indexes
+      for (int i = 0; i < 1000000; i++) {

Review Comment:
   Why do we need such a large value here even though we're configuring the 
query killer's heap usage ratio to kill queries on to be `0.00f`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to