Re: [PR] add instrumentation to json index getMatchingFlattenedDocsMap() [pinot]

via GitHub Wed, 22 May 2024 11:07:39 -0700


itschrispeck commented on code in PR #13164:
URL: https://github.com/apache/pinot/pull/13164#discussion_r1610433457



##########
pinot-core/src/test/java/org/apache/pinot/core/accounting/ResourceManagerAccountingTest.java:
##########
@@ -368,6 +379,105 @@ public void testGetDataTableOOMGroupBy()
     Assert.assertTrue(earlyTerminationOccurred.get());
   }
 
+  /**
+   * Test instrumentation in getMatchingFlattenedDocsMap() from
+   * {@link org.apache.pinot.segment.spi.index.reader.JsonIndexReader}
+   *
+   * Since getMatchingFlattenedDocsMap() can collect a large map before 
processing any blocks, it is required to
+   * check for OOM during map generation. This test generates a mutable and 
immutable json index, and generates a map
+   * as would happen in json_extract_index execution.
+   *
+   * It is roughly equivalent to running json_extract_index(col, '$.key', 
'STRING').
+   */
+  @Test
+  public void testJsonIndexExtractMapOOM()
+      throws Exception {
+    HashMap<String, Object> configs = new HashMap<>();
+    ServerMetrics.register(Mockito.mock(ServerMetrics.class));
+    ThreadResourceUsageProvider.setThreadMemoryMeasurementEnabled(true);
+    
LogManager.getLogger(PerQueryCPUMemResourceUsageAccountant.class).setLevel(Level.OFF);
+    
LogManager.getLogger(ThreadResourceUsageProvider.class).setLevel(Level.OFF);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_ALARMING_LEVEL_HEAP_USAGE_RATIO,
 0.00f);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_CRITICAL_LEVEL_HEAP_USAGE_RATIO,
 0.00f);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME,
+        "org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory");
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_MEMORY_SAMPLING, 
true);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_CPU_SAMPLING, 
false);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_OOM_PROTECTION_KILLING_QUERY, 
true);
+    
configs.put(CommonConstants.Accounting.CONFIG_OF_MIN_MEMORY_FOOTPRINT_TO_KILL_RATIO,
 0.00f);
+
+    PinotConfiguration config = getConfig(2, 2, configs);
+    ResourceManager rm = getResourceManager(2, 2, 1, 1, configs);
+    // init accountant and start watcher task
+    Tracing.ThreadAccountantOps.initializeThreadAccountant(config, 
"testJsonIndexExtractMapOOM");
+
+    Supplier<String> randomJsonValue = () -> {
+      Random random = new Random();
+      int length = random.nextInt(1000);
+      StringBuilder sb = new StringBuilder();
+      for (int i = 0; i < length; i++) {
+        sb.append((char) (random.nextInt(26) + 'a'));
+      }
+      return "{\"key\":\"" + sb + "\"}";
+    };
+
+    File indexDir = new File(FileUtils.getTempDirectory(), 
"testJsonIndexExtractMapOOM");
+    FileUtils.forceMkdir(indexDir);
+    String colName = "col";
+    try (JsonIndexCreator offHeapIndexCreator = new 
OffHeapJsonIndexCreator(indexDir, colName, new JsonIndexConfig());
+        MutableJsonIndexImpl mutableJsonIndex = new MutableJsonIndexImpl(new 
JsonIndexConfig())) {
+      // build json indexes
+      for (int i = 0; i < 1000000; i++) {

Review Comment:
   My guess is that the more frequent GCs in github runners make measuring 
thread memory difficult. When logs are enabled, I can see `But all queries are 
below quota, no query killed` a couple times before the query is picked. So my 
assumption is that GCs are causing the thread memory measurements to be lower 
than the initial/previous thread memory measurements.
   
   Locally, it works reliably at 100k but failed when I initially pushed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] add instrumentation to json index getMatchingFlattenedDocsMap() [pinot]

Reply via email to