gianm commented on code in PR #12763: URL: https://github.com/apache/druid/pull/12763#discussion_r917449423
########## processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryConfig.java: ########## @@ -43,12 +48,14 @@ private static final String CTX_KEY_BUFFER_GROUPER_MAX_LOAD_FACTOR = "bufferGrouperMaxLoadFactor"; private static final String CTX_KEY_BUFFER_GROUPER_MAX_SIZE = "bufferGrouperMaxSize"; private static final String CTX_KEY_MAX_ON_DISK_STORAGE = "maxOnDiskStorage"; - private static final String CTX_KEY_MAX_SELECTOR_DICTIONARY_SIZE = "maxSelectorDictionarySize"; - private static final String CTX_KEY_MAX_MERGING_DICTIONARY_SIZE = "maxMergingDictionarySize"; private static final String CTX_KEY_FORCE_HASH_AGGREGATION = "forceHashAggregation"; private static final String CTX_KEY_INTERMEDIATE_COMBINE_DEGREE = "intermediateCombineDegree"; private static final String CTX_KEY_NUM_PARALLEL_COMBINE_THREADS = "numParallelCombineThreads"; private static final String CTX_KEY_MERGE_THREAD_LOCAL = "mergeThreadLocal"; + private static final double MERGING_DICTIONARY_HEAP_FRACTION = 0.3; + private static final double SELECTOR_DICTIONARY_HEAP_FRACTION = 0.10; Review Comment: I added this explanation: ``` + // Constants for sizing merging and selector dictionaries. Rationale for these constants: + // 1) In no case do we want total aggregate dictionary size to exceed 40% of max memory. + // 2) In no case do we want any dictionary to exceed 1GB of memory: if heaps are giant, better to spill at + // "reasonable" sizes rather than get giant dictionaries. (There is probably some other reason the user + // wanted a giant heap, so we shouldn't monopolize it with dictionaries.) + // 3) Use somewhat more memory for merging dictionary vs. selector dictionaries, because if a merging + // dictionary is full we must spill to disk, whereas if a selector dictionary is full we simply emit + // early to the merge buffer. So, a merging dictionary filling up has a more severe impact on + // query performance. ``` ########## processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryConfig.java: ########## @@ -75,11 +82,11 @@ @JsonProperty // Size of on-heap string dictionary for merging, per-processing-thread; when exceeded, partial results will be // emitted to the merge buffer early. - private long maxSelectorDictionarySize = 100_000_000L; + private long maxSelectorDictionarySize = AUTOMATIC; @JsonProperty // Size of on-heap string dictionary for merging, per-query; when exceeded, partial results will be spilled to disk - private long maxMergingDictionarySize = 100_000_000L; + private long maxMergingDictionarySize = AUTOMATIC; Review Comment: Good idea. I changed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
