(druid) branch master updated: Add spill file count limit for GroupBy query (#19141)

maytasm Fri, 13 Mar 2026 01:17:34 -0700

This is an automated email from the ASF dual-hosted git repository.

maytasm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new e22986ee98f Add spill file count limit for GroupBy query (#19141)
e22986ee98f is described below

commit e22986ee98fdd5e852a0430d13982da7d12b44a8
Author: Maytas Monsereenusorn <[email protected]>
AuthorDate: Fri Mar 13 01:16:07 2026 -0700

    Add spill file count limit for GroupBy query (#19141)
    
    GroupBy queries that group on high-cardinality dimensions can create a 
large number of spill files. This problem is more likely when queries contain 
many aggregators and/or aggregators with large memory footprints (e.g., 
DataSketch). This is because GroupBy can only hold a limited number of unique 
groupings in memory before flushing to disk — the exact limit depends on the 
size of each row, which is determined by the size of the aggregators. The issue 
arises when GroupBy attempts to m [...]
    
    This PR fixes the issue by introducing a new property: 
druid.query.groupBy.maxSpillFileCount
    The maximum number of spill files allowed per GroupBy query. When the limit 
is reached, the query fails with a ResourceLimitExceededException. This 
property can be used to prevent historical nodes from OOMing due to an 
excessive number of spill files being opened simultaneously during the merge 
phase. Defaults to Integer.MAX_VALUE (unlimited). Can also be set per query via 
the query context key maxSpillFileCount.
    
    Note that this new config, maxSpillFileCount, is complementary to the 
existing maxOnDiskStorage. maxOnDiskStorage limits total bytes across all spill 
files, but cannot prevent a large number of tiny files — a query can create 
hundreds of thousands of spill files while staying well under the byte limit. 
maxSpillFileCount fills this gap by limiting file count directly, which bounds 
the number of simultaneously open file handles during the merge phase. This 
situation arises when aggregat [...]
---
 docs/configuration/index.md                        |  2 ++
 docs/querying/groupbyquery.md                      | 15 +++++++++-
 .../druid/query/groupby/GroupByQueryConfig.java    | 16 +++++++++++
 .../epinephelinae/GroupByMergingQueryRunner.java   |  1 +
 .../groupby/epinephelinae/GroupByRowProcessor.java |  1 +
 .../epinephelinae/LimitedTemporaryStorage.java     |  9 ++++++
 .../groupby/epinephelinae/SpillingGrouper.java     | 13 +++++++++
 .../TemporaryStorageFileLimitException.java        | 32 ++++++++++++++++++++++
 .../query/groupby/GroupByQueryConfigTest.java      |  5 ++++
 .../query/groupby/GroupByQueryRunnerTest.java      | 28 +++++++++++++++++++
 .../epinephelinae/ConcurrentGrouperTest.java       |  3 +-
 website/.spelling                                  |  2 ++
 12 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 713ba2f6b1d..d870795c536 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -2266,6 +2266,7 @@ Supported runtime properties:
 |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-segment string dictionaries. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|100000000|
 |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-query string dictionaries. When the dictionary 
exceeds this size, a spill to disk will be triggered. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|100000000|
 |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, 
per-query, for spilling result sets to disk when either the merging buffer or 
the dictionary fills up. Queries that exceed this limit will fail. Set to zero 
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed 
per GroupBy query. Queries that exceed this limit will fail. See [groupBy 
memory tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|Integer.MAX_VALUE (unlimited)|
 |`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to 
use, per-query, for spilling the result sets to disk when either the merging 
buffer or the dictionary fills up. Set to zero to disable disk spilling for 
queries which don't override `maxOnDiskStorage` in their 
context.|`druid.query.groupBy.maxOnDiskStorage`|
 
 Supported query contexts:
@@ -2275,6 +2276,7 @@ Supported query contexts:
 |`maxSelectorDictionarySize`|Can be used to lower the value of 
`druid.query.groupBy.maxMergingDictionarySize` for this query.|
 |`maxMergingDictionarySize`|Can be used to lower the value of 
`druid.query.groupBy.maxMergingDictionarySize` for this query.|
 |`maxOnDiskStorage`|Can be used to set `maxOnDiskStorage` to a value between 0 
and `druid.query.groupBy.maxOnDiskStorage` for this query. If this query 
context override exceeds `druid.query.groupBy.maxOnDiskStorage`, the query will 
use `druid.query.groupBy.maxOnDiskStorage`. Omitting this from the query 
context will cause the query to use `druid.query.groupBy.defaultOnDiskStorage` 
for `maxOnDiskStorage`|
+|`maxSpillFileCount`|Can be used to override the value of 
`druid.query.groupBy.maxSpillFileCount` for this query.|
 
 ### Advanced configurations
 
diff --git a/docs/querying/groupbyquery.md b/docs/querying/groupbyquery.md
index 58e20fa54d0..cf8aea8f505 100644
--- a/docs/querying/groupbyquery.md
+++ b/docs/querying/groupbyquery.md
@@ -242,7 +242,7 @@ The response for the query above would look something like:
 
 ### Memory tuning and resource limits
 
-When using groupBy, four parameters control resource usage and limits:
+When using groupBy, the following parameters control resource usage and limits:
 
 - `druid.processing.buffer.sizeBytes`: size of the off-heap hash table used 
for aggregation, per query, in bytes. At
 most `druid.processing.numMergeBuffers` of these will be created at once, 
which also serves as an upper limit on the
@@ -254,6 +254,17 @@ rough estimate of the dictionary footprint.
 - `druid.query.groupBy.maxMergingDictionarySize`: size of the on-heap 
query-level dictionary used when grouping on
 any string expression. There is at most one dictionary per 
concurrently-running query; therefore there are up to
 `druid.server.http.numThreads` of these. Note that the size is based on a 
rough estimate of the dictionary footprint.
+- `druid.query.groupBy.maxSpillFileCount`: maximum number of spill files 
allowed per GroupBy query. When the limit is 
+reached, the query fails with a ResourceLimitExceededException. This property 
can be used to prevent historical nodes 
+from OOMs due to an excessive number of spill files being opened 
simultaneously during the merge phase. This config is 
+complementary to maxOnDiskStorage. maxOnDiskStorage limits total bytes across 
all spill files, but cannot prevent a large 
+number of tiny files — a query can create hundreds of thousands of spill files 
while staying well under the byte limit. 
+`maxSpillFileCount` fills this gap by limiting file count directly, which 
bounds the number of simultaneously open file handles 
+during the merge phase. This situation arises on queries that group on 
high-cardinality dimensions and contain many aggregators 
+and/or aggregators with large memory footprints. When aggregators like 
thetaSketch pre-allocate a large fixed buffer per row in 
+memory, causing the buffer to flush frequently with only a small number of 
rows; since each row corresponds to a unique 
+grouping key in a high-cardinality dimension, each sketch has seen very few 
values at flush time and serializes to only 
+a few bytes on disk using the sketch's compact format. Defaults to 
Integer.MAX_VALUE (unlimited).
 - `druid.query.groupBy.maxOnDiskStorage`: amount of space on disk used for 
aggregation, per query, in bytes. By default,
 this is 0, which means aggregation will not use disk.
 
@@ -346,12 +357,14 @@ Supported runtime properties:
 |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-segment string dictionaries.  If set to `0` 
(automatic), each query's dictionary can use 10% of the Java heap divided by 
`druid.processing.numMergeBuffers`, or 1GB, whichever is smaller.<br /><br 
/>See [Memory tuning and resource limits](#memory-tuning-and-resource-limits) 
for details on changing this property.|0 (automatic)|
 |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-query string dictionaries. When the dictionary 
exceeds this size, a spill to disk will be triggered. If set to `0` 
(automatic), each query's dictionary uses 30% of the Java heap divided by 
`druid.processing.numMergeBuffers`, or 1GB, whichever is smaller.<br /><br 
/>See [Memory tuning and resource limits](#memory-tuning-and-resource-limits) 
for details on changing this property.|0 ( [...]
 |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, 
per-query, for spilling result sets to disk when either the merging buffer or 
the dictionary fills up. Queries that exceed this limit will fail. Set to zero 
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed 
per GroupBy query. Queries that exceed this limit will fail.<br /><br />See 
[Memory tuning and resource limits](#memory-tuning-and-resource-limits) for 
details on changing this property.|Integer.MAX_VALUE (unlimited)|
 
 Supported query contexts:
 
 |Key|Description|
 |---|-----------|
 |`maxOnDiskStorage`|Can be used to lower the value of 
`druid.query.groupBy.maxOnDiskStorage` for this query.|
+|`maxSpillFileCount`|Can be used to override the value of 
`druid.query.groupBy.maxSpillFileCount` for this query.|
 
 ### Advanced configurations
 
diff --git 
a/processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryConfig.java
 
b/processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryConfig.java
index 23ef9fae820..c1336bc11c2 100644
--- 
a/processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryConfig.java
+++ 
b/processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryConfig.java
@@ -53,6 +53,7 @@ public class GroupByQueryConfig
   private static final String CTX_KEY_MAX_ON_DISK_STORAGE = "maxOnDiskStorage";
   private static final String CTX_KEY_MAX_SELECTOR_DICTIONARY_SIZE = 
"maxSelectorDictionarySize";
   private static final String CTX_KEY_MAX_MERGING_DICTIONARY_SIZE = 
"maxMergingDictionarySize";
+  private static final String CTX_KEY_MAX_SPILL_FILE_COUNT = 
"maxSpillFileCount";
   private static final String CTX_KEY_FORCE_HASH_AGGREGATION = 
"forceHashAggregation";
   private static final String CTX_KEY_INTERMEDIATE_COMBINE_DEGREE = 
"intermediateCombineDegree";
   private static final String CTX_KEY_NUM_PARALLEL_COMBINE_THREADS = 
"numParallelCombineThreads";
@@ -94,6 +95,11 @@ public class GroupByQueryConfig
   // Size of on-heap string dictionary for merging, per-query; when exceeded, 
partial results will be spilled to disk
   private HumanReadableBytes maxMergingDictionarySize = 
HumanReadableBytes.valueOf(AUTOMATIC);
 
+  @JsonProperty
+  // Maximum number of spill files per query; when exceeded, the query fails.
+  // This is a safety valve to prevent OOM errors.
+  private int maxSpillFileCount = Integer.MAX_VALUE;
+
   @JsonProperty
   // Max on-disk temporary storage, per-query; when exceeded, the query fails
   private HumanReadableBytes maxOnDiskStorage = HumanReadableBytes.valueOf(0);
@@ -240,6 +246,11 @@ public class GroupByQueryConfig
     return maxOnDiskStorage;
   }
 
+  public int getMaxSpillFileCount()
+  {
+    return maxSpillFileCount;
+  }
+
   /**
    * Mirror maxOnDiskStorage if defaultOnDiskStorage's default is not 
overridden by cluster operator.
    *
@@ -341,6 +352,11 @@ public class GroupByQueryConfig
     newConfig.maxMergingDictionarySize = queryContext
         .getHumanReadableBytes(CTX_KEY_MAX_MERGING_DICTIONARY_SIZE, 
getConfiguredMaxMergingDictionarySize());
 
+    newConfig.maxSpillFileCount = queryContext.getInt(
+        CTX_KEY_MAX_SPILL_FILE_COUNT,
+        getMaxSpillFileCount()
+    );
+
     newConfig.forcePushDownLimit = 
queryContext.getBoolean(CTX_KEY_FORCE_LIMIT_PUSH_DOWN, isForcePushDownLimit());
     newConfig.applyLimitPushDownToSegment = queryContext.getBoolean(
         CTX_KEY_APPLY_LIMIT_PUSH_DOWN_TO_SEGMENT,
diff --git 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java
 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java
index 43bec66de7c..e47f6590ff8 100644
--- 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java
+++ 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java
@@ -192,6 +192,7 @@ public class GroupByMergingQueryRunner implements 
QueryRunner<ResultRow>
               final LimitedTemporaryStorage temporaryStorage = new 
LimitedTemporaryStorage(
                   temporaryStorageDirectory,
                   querySpecificConfig.getMaxOnDiskStorage().getBytes(),
+                  querySpecificConfig.getMaxSpillFileCount(),
                   perQueryStats
               );
 
diff --git 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByRowProcessor.java
 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByRowProcessor.java
index e2ca5c7e83b..3a4b7a3121f 100644
--- 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByRowProcessor.java
+++ 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByRowProcessor.java
@@ -109,6 +109,7 @@ public class GroupByRowProcessor
     final LimitedTemporaryStorage temporaryStorage = new 
LimitedTemporaryStorage(
         temporaryStorageDirectory,
         querySpecificConfig.getMaxOnDiskStorage().getBytes(),
+        querySpecificConfig.getMaxSpillFileCount(),
         perQueryStats
     );
 
diff --git 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedTemporaryStorage.java
 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedTemporaryStorage.java
index 23bc2706a2d..1e71abc06bc 100644
--- 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedTemporaryStorage.java
+++ 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedTemporaryStorage.java
@@ -52,6 +52,7 @@ public class LimitedTemporaryStorage implements Closeable
 
   private final File storageDirectory;
   private final long maxBytesUsed;
+  private final int maxFileCount;
 
   private final AtomicLong bytesUsed = new AtomicLong();
   private final Set<File> files = new TreeSet<>();
@@ -63,17 +64,21 @@ public class LimitedTemporaryStorage implements Closeable
   public LimitedTemporaryStorage(
       File storageDirectory,
       long maxBytesUsed,
+      int maxFileCount,
       GroupByStatsProvider.PerQueryStats perQueryStatsContainer
   )
   {
     this.storageDirectory = storageDirectory;
     this.maxBytesUsed = maxBytesUsed;
+    this.maxFileCount = maxFileCount;
     this.perQueryStatsContainer = perQueryStatsContainer;
   }
 
   /**
    * Create a new temporary file. All methods of the returned output stream 
may throw
    * {@link TemporaryStorageFullException} if the temporary storage area fills 
up.
+   * This method may also throw {@link TemporaryStorageFileLimitException} if 
the number of files in the
+   * temporary storage exceeds the configured limit.
    *
    * @return output stream to the file
    *
@@ -86,6 +91,10 @@ public class LimitedTemporaryStorage implements Closeable
       throw new TemporaryStorageFullException(maxBytesUsed);
     }
 
+    if (files.size() >= maxFileCount) {
+      throw new TemporaryStorageFileLimitException(maxFileCount);
+    }
+
     synchronized (files) {
       if (closed) {
         throw new ISE("Closed");
diff --git 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java
 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java
index 688c9f06566..904a7ef8864 100644
--- 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java
+++ 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java
@@ -67,6 +67,10 @@ public class SpillingGrouper<KeyType> implements 
Grouper<KeyType>
       0,
       "Not enough disk space to execute this query. Try raising 
druid.query.groupBy.maxOnDiskStorage."
   );
+  private static final AggregateResult MAX_FILE = AggregateResult.partial(
+      0,
+      "Maximum number of spill files reached for this query. Try raising 
druid.query.groupBy.maxSpillFileCount."
+  );
 
   private final AbstractBufferHashGrouper<KeyType> grouper;
   private final KeySerde<KeyType> keySerde;
@@ -82,6 +86,7 @@ public class SpillingGrouper<KeyType> implements 
Grouper<KeyType>
   private final boolean sortHasNonGroupingFields;
 
   private boolean diskFull = false;
+  private boolean maxFileCount = false;
   private boolean spillingAllowed;
 
   public SpillingGrouper(
@@ -183,6 +188,10 @@ public class SpillingGrouper<KeyType> implements 
Grouper<KeyType>
       return DISK_FULL;
     }
 
+    if (maxFileCount) {
+      return MAX_FILE;
+    }
+
     final AggregateResult result = grouper.aggregate(key, keyHash);
 
     if (result.isOk() || !spillingAllowed || temporaryStorage.maxSize() <= 0) {
@@ -199,6 +208,10 @@ public class SpillingGrouper<KeyType> implements 
Grouper<KeyType>
         diskFull = true;
         return DISK_FULL;
       }
+      catch (TemporaryStorageFileLimitException e) {
+        maxFileCount = true;
+        return MAX_FILE;
+      }
       catch (IOException e) {
         throw new RuntimeException(e);
       }
diff --git 
a/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java
 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java
new file mode 100644
index 00000000000..b02bdc649d1
--- /dev/null
+++ 
b/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae;
+
+import org.apache.druid.java.util.common.StringUtils;
+
+import java.io.IOException;
+
+public class TemporaryStorageFileLimitException extends IOException
+{
+  public TemporaryStorageFileLimitException(final int fileCount)
+  {
+    super(StringUtils.format("Cannot write to disk, hit spill file count limit 
of %,d.", fileCount));
+  }
+}
diff --git 
a/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryConfigTest.java
 
b/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryConfigTest.java
index 49b0b035f37..2bfcf0b5aca 100644
--- 
a/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryConfigTest.java
+++ 
b/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryConfigTest.java
@@ -40,6 +40,7 @@ public class GroupByQueryConfigTest
       .put("maxSelectorDictionarySize", "5")
       .put("maxMergingDictionarySize", "6M")
       .put("bufferGrouperMaxLoadFactor", "7")
+      .put("maxSpillFileCount", "123")
       .build();
 
   @Test
@@ -50,6 +51,7 @@ public class GroupByQueryConfigTest
     Assert.assertEquals(true, config.isSingleThreaded());
     Assert.assertEquals(1, config.getBufferGrouperInitialBuckets());
     Assert.assertEquals(4_000_000, config.getMaxOnDiskStorage().getBytes());
+    Assert.assertEquals(123, config.getMaxSpillFileCount());
     Assert.assertEquals(1_000_000, 
config.getDefaultOnDiskStorage().getBytes());
     Assert.assertEquals(5, config.getConfiguredMaxSelectorDictionarySize());
     Assert.assertEquals(6_000_000, 
config.getConfiguredMaxMergingDictionarySize());
@@ -72,6 +74,7 @@ public class GroupByQueryConfigTest
     Assert.assertEquals(true, config2.isSingleThreaded());
     Assert.assertEquals(1, config2.getBufferGrouperInitialBuckets());
     Assert.assertEquals(1_000_000, config2.getMaxOnDiskStorage().getBytes());
+    Assert.assertEquals(123, config2.getMaxSpillFileCount());
     Assert.assertEquals(5, config2.getConfiguredMaxSelectorDictionarySize());
     Assert.assertEquals(6_000_000, 
config2.getConfiguredMaxMergingDictionarySize());
     Assert.assertEquals(7.0, config2.getBufferGrouperMaxLoadFactor(), 0.0);
@@ -94,6 +97,7 @@ public class GroupByQueryConfigTest
                                     .put("maxResults", 2)
                                     .put("maxSelectorDictionarySize", 3)
                                     .put("maxMergingDictionarySize", 4)
+                                    .put("maxSpillFileCount", 333)
                                     .put("applyLimitPushDownToSegment", true)
                                     .put(
                                         
GroupByQueryConfig.CTX_KEY_DEFER_EXPRESSION_DIMENSIONS,
@@ -107,6 +111,7 @@ public class GroupByQueryConfigTest
     Assert.assertEquals(true, config2.isSingleThreaded());
     Assert.assertEquals(1, config2.getBufferGrouperInitialBuckets());
     Assert.assertEquals(3_000_000, config2.getMaxOnDiskStorage().getBytes());
+    Assert.assertEquals(333, config2.getMaxSpillFileCount());
     Assert.assertEquals(3, config2.getConfiguredMaxSelectorDictionarySize());
     Assert.assertEquals(4, config2.getConfiguredMaxMergingDictionarySize());
     Assert.assertEquals(7.0, config2.getBufferGrouperMaxLoadFactor(), 0.0);
diff --git 
a/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
 
b/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
index 9679d9e8602..b0fc2c51583 100644
--- 
a/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
+++ 
b/processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
@@ -2945,6 +2945,34 @@ public class GroupByQueryRunnerTest extends 
InitializedNullHandlingTest
     TestHelper.assertExpectedObjects(expectedResults, results, 
"overide-maxOnDiskStorage");
   }
 
+  @Test
+  public void testMaxSpillFileCountLimitThroughContextOverride()
+  {
+    // Granularity != ALL requires time-ordering.
+    assumeTimeOrdered();
+    
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(new DefaultDimensionSpec("quality", "alias"))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new 
LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.DAY_GRAN)
+        .overrideContext(ImmutableMap.of("maxSpillFileCount", 1, 
GroupByQueryConfig.CTX_KEY_BUFFER_GROUPER_MAX_SIZE, 1))
+        .build();
+
+    List<ResultRow> expectedResults = null;
+    expectedException.expect(ResourceLimitExceededException.class);
+    if (config.getMaxOnDiskStorage().getBytes() > 0) {
+      // The error message always mentions disk if you have spilling enabled 
(maxOnDiskStorage > 0)
+      expectedException.expectMessage("Maximum number of spill files reached 
for this query. Try raising druid.query.groupBy.maxSpillFileCount.");
+    } else {
+      expectedException.expectMessage("Not enough merge buffer memory to 
execute this query");
+    }
+
+    Iterable<ResultRow> results = 
GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "disk-space");
+  }
+
   @Test
   public void testNotEnoughDiskSpaceThroughContextOverride()
   {
diff --git 
a/processing/src/test/java/org/apache/druid/query/groupby/epinephelinae/ConcurrentGrouperTest.java
 
b/processing/src/test/java/org/apache/druid/query/groupby/epinephelinae/ConcurrentGrouperTest.java
index 50e01b0f60c..bb41f92f782 100644
--- 
a/processing/src/test/java/org/apache/druid/query/groupby/epinephelinae/ConcurrentGrouperTest.java
+++ 
b/processing/src/test/java/org/apache/druid/query/groupby/epinephelinae/ConcurrentGrouperTest.java
@@ -152,6 +152,7 @@ public class ConcurrentGrouperTest extends 
InitializedNullHandlingTest
     final LimitedTemporaryStorage temporaryStorage = new 
LimitedTemporaryStorage(
         temporaryFolder.newFolder(),
         1024 * 1024,
+        100000,
         perQueryStats
     );
     final ListeningExecutorService service = 
MoreExecutors.listeningDecorator(exec);
@@ -244,7 +245,7 @@ public class ConcurrentGrouperTest extends 
InitializedNullHandlingTest
           1024,
           0.7f,
           1,
-          new LimitedTemporaryStorage(temporaryFolder.newFolder(), 1024 * 
1024, perQueryStats),
+          new LimitedTemporaryStorage(temporaryFolder.newFolder(), 1024 * 
1024, 100000, perQueryStats),
           new DefaultObjectMapper(),
           concurrencyHint,
           null,
diff --git a/website/.spelling b/website/.spelling
index cc00e32f1a4..f2b87496eb2 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -1824,6 +1824,7 @@ druid.query.groupBy.maxResults.
 groupByStrategy
 maxOnDiskStorage
 maxResults
+maxSpillFileCount
 orderby
 orderbys
 outputName
@@ -1841,6 +1842,7 @@ DefaultDimensionSpec
 druid-hll
 isInputHyperUnique
 pre-join
+pre-allocate
 DefaultLimitSpec
 OrderByColumnSpec
 OrderByColumnSpecs


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: Add spill file count limit for GroupBy query (#19141)

Reply via email to