yashmayya commented on code in PR #13164:
URL: https://github.com/apache/pinot/pull/13164#discussion_r1609726180
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java:
##########
@@ -411,4 +407,18 @@ public String[][] transformToStringValuesMV(ValueBlock
valueBlock) {
}
return _stringValuesMV;
}
+
+ /**
+ * Lazily initialize _valueToMatchingDocsMap, so that map generation is
skipped when filtering excludes all values
+ */
+ private Map<String, RoaringBitmap> getValueToMatchingDocsMap() {
Review Comment:
Just for my understanding, this isn't referring to the filtering done by the
`JsonExtractIndexTransformFunction` itself right (because that's only done
after the map is generated in the `JsonIndexReader`)? So is this referring to
query level filters that might result in none of this transform function's
`transformToXyz` methods being called (and I presume the transform function's
`init` method is still called beforehand)?
##########
pinot-core/src/test/java/org/apache/pinot/core/accounting/ResourceManagerAccountingTest.java:
##########
@@ -368,6 +379,105 @@ public void testGetDataTableOOMGroupBy()
Assert.assertTrue(earlyTerminationOccurred.get());
}
+ /**
+ * Test instrumentation in getMatchingFlattenedDocsMap() from
+ * {@link org.apache.pinot.segment.spi.index.reader.JsonIndexReader}
+ *
+ * Since getMatchingFlattenedDocsMap() can collect a large map before
processing any blocks, it is required to
+ * check for OOM during map generation. This test generates a mutable and
immutable json index, and generates a map
+ * as would happen in json_extract_index execution.
+ *
+ * It is roughly equivalent to running json_extract_index(col, '$.key',
'STRING').
+ */
+ @Test
+ public void testJsonIndexExtractMapOOM()
+ throws Exception {
+ HashMap<String, Object> configs = new HashMap<>();
+ ServerMetrics.register(Mockito.mock(ServerMetrics.class));
+ ThreadResourceUsageProvider.setThreadMemoryMeasurementEnabled(true);
+
LogManager.getLogger(PerQueryCPUMemResourceUsageAccountant.class).setLevel(Level.OFF);
+
LogManager.getLogger(ThreadResourceUsageProvider.class).setLevel(Level.OFF);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_ALARMING_LEVEL_HEAP_USAGE_RATIO,
0.00f);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_CRITICAL_LEVEL_HEAP_USAGE_RATIO,
0.00f);
+ configs.put(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME,
+ "org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory");
+
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_MEMORY_SAMPLING,
true);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_CPU_SAMPLING,
false);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_OOM_PROTECTION_KILLING_QUERY,
true);
+
configs.put(CommonConstants.Accounting.CONFIG_OF_MIN_MEMORY_FOOTPRINT_TO_KILL_RATIO,
0.00f);
+
+ PinotConfiguration config = getConfig(2, 2, configs);
+ ResourceManager rm = getResourceManager(2, 2, 1, 1, configs);
+ // init accountant and start watcher task
+ Tracing.ThreadAccountantOps.initializeThreadAccountant(config,
"testJsonIndexExtractMapOOM");
+
+ Supplier<String> randomJsonValue = () -> {
+ Random random = new Random();
+ int length = random.nextInt(1000);
+ StringBuilder sb = new StringBuilder();
+ for (int i = 0; i < length; i++) {
+ sb.append((char) (random.nextInt(26) + 'a'));
+ }
+ return "{\"key\":\"" + sb + "\"}";
+ };
+
+ File indexDir = new File(FileUtils.getTempDirectory(),
"testJsonIndexExtractMapOOM");
+ FileUtils.forceMkdir(indexDir);
+ String colName = "col";
+ try (JsonIndexCreator offHeapIndexCreator = new
OffHeapJsonIndexCreator(indexDir, colName, new JsonIndexConfig());
+ MutableJsonIndexImpl mutableJsonIndex = new MutableJsonIndexImpl(new
JsonIndexConfig())) {
+ // build json indexes
+ for (int i = 0; i < 1000000; i++) {
Review Comment:
Why do we need such a large value here even though we're configuring the
query killer's heap usage ratio to kill queries on to be `0.00f`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]