itschrispeck commented on code in PR #13164:
URL: https://github.com/apache/pinot/pull/13164#discussion_r1610433457
##########
pinot-core/src/test/java/org/apache/pinot/core/accounting/ResourceManagerAccountingTest.java:
##########
@@ -368,6 +379,105 @@ public void testGetDataTableOOMGroupBy()
Assert.assertTrue(earlyTerminationOccurred.get());
}
+ /**
+ * Test instrumentation in getMatchingFlattenedDocsMap() from
+ * {@link org.apache.pinot.segment.spi.index.reader.JsonIndexReader}
+ *
+ * Since getMatchingFlattenedDocsMap() can collect a large map before
processing any blocks, it is required to
+ * check for OOM during map generation. This test generates a mutable and
immutable json index, and generates a map
+ * as would happen in json_extract_index execution.
+ *
+ * It is roughly equivalent to running json_extract_index(col, '$.key',
'STRING').
+ */
+ @Test
+ public void testJsonIndexExtractMapOOM()
+ throws Exception {
+ HashMap<String, Object> configs = new HashMap<>();
+ ServerMetrics.register(Mockito.mock(ServerMetrics.class));
+ ThreadResourceUsageProvider.setThreadMemoryMeasurementEnabled(true);
+
LogManager.getLogger(PerQueryCPUMemResourceUsageAccountant.class).setLevel(Level.OFF);
+
LogManager.getLogger(ThreadResourceUsageProvider.class).setLevel(Level.OFF);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_ALARMING_LEVEL_HEAP_USAGE_RATIO,
0.00f);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_CRITICAL_LEVEL_HEAP_USAGE_RATIO,
0.00f);
+ configs.put(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME,
+ "org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory");
+
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_MEMORY_SAMPLING,
true);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_CPU_SAMPLING,
false);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_OOM_PROTECTION_KILLING_QUERY,
true);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_MIN_MEMORY_FOOTPRINT_TO_KILL_RATIO,
0.00f);
+
+ PinotConfiguration config = getConfig(2, 2, configs);
+ ResourceManager rm = getResourceManager(2, 2, 1, 1, configs);
+ // init accountant and start watcher task
+ Tracing.ThreadAccountantOps.initializeThreadAccountant(config,
"testJsonIndexExtractMapOOM");
+
+ Supplier<String> randomJsonValue = () -> {
+ Random random = new Random();
+ int length = random.nextInt(1000);
+ StringBuilder sb = new StringBuilder();
+ for (int i = 0; i < length; i++) {
+ sb.append((char) (random.nextInt(26) + 'a'));
+ }
+ return "{\"key\":\"" + sb + "\"}";
+ };
+
+ File indexDir = new File(FileUtils.getTempDirectory(),
"testJsonIndexExtractMapOOM");
+ FileUtils.forceMkdir(indexDir);
+ String colName = "col";
+ try (JsonIndexCreator offHeapIndexCreator = new
OffHeapJsonIndexCreator(indexDir, colName, new JsonIndexConfig());
+ MutableJsonIndexImpl mutableJsonIndex = new MutableJsonIndexImpl(new
JsonIndexConfig())) {
+ // build json indexes
+ for (int i = 0; i < 1000000; i++) {
Review Comment:
My guess is that the more frequent GCs in github runners make measuring
thread memory difficult. When logs are enabled, I can see `But all queries are
below quota, no query killed` a couple times before the query is picked. So my
assumption is that GCs are causing the thread memory measurements to be lower
than the initial/previous thread memory measurements.
Locally, it works reliably at 100k but failed when I initially pushed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]