This is an automated email from the ASF dual-hosted git repository.
aho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 0c6a1da299b feat: Add configurable truncation for string columns
(#19146)
0c6a1da299b is described below
commit 0c6a1da299bf14ab352976d8954ab51192c667b9
Author: Jay Kanakiya <[email protected]>
AuthorDate: Fri Mar 20 14:23:02 2026 -0700
feat: Add configurable truncation for string columns (#19146)
* support configurable truncation for string columns
* add spelling for checks
* add validations and update tests
* trigger build
* add single value mvd test and doc update
* Update docs/ingestion/ingestion-spec.md
Co-authored-by: aho135 <[email protected]>
---------
Co-authored-by: aho135 <[email protected]>
---
docs/configuration/index.md | 1 +
docs/ingestion/ingestion-spec.md | 1 +
.../data/input/impl/StringDimensionSchema.java | 35 +++++++++++--
.../org/apache/druid/guice/BuiltInTypesModule.java | 20 ++++++++
.../druid/segment/DefaultColumnFormatConfig.java | 36 ++++++++++++--
.../druid/segment/StringDimensionHandler.java | 15 +++++-
.../druid/segment/StringDimensionIndexer.java | 25 +++++++++-
.../data/input/impl/StringDimensionSchemaTest.java | 4 +-
.../apache/druid/guice/BuiltInTypesModuleTest.java | 38 ++++++++++++--
.../druid/query/scan/NestedDataScanQueryTest.java | 2 +-
.../segment/DefaultColumnFormatsConfigTest.java | 5 +-
.../druid/segment/NestedDataColumnSchemaTest.java | 5 +-
.../druid/segment/StringDimensionIndexerTest.java | 58 ++++++++++++++++++++++
.../java/org/apache/druid/cli/DumpSegmentTest.java | 16 +++---
website/.spelling | 1 +
15 files changed, 236 insertions(+), 26 deletions(-)
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index b9ee7d7c4c7..853ec3878a2 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -1424,6 +1424,7 @@ Additional Peon configs include:
|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to
store empty columns during ingestion. When set to true, Druid stores every
column specified in the
[`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). If you use
the string-based schemaless ingestion and don't specify any dimensions to
ingest, you must also set
[`includeAllDimensions`](../ingestion/ingestion-spec.md#dimensionsspec) for
Druid to store empty columns.<br/><br/>If you set `storeEmptyCo [...]
|`druid.indexer.task.tmpStorageBytesPerTask`|Maximum number of bytes per task
to be used to store temporary files on disk. This config is generally intended
for internal usage. Attempts to set it are very likely to be overwritten by the
TaskRunner that executes the task, so be sure of what you expect to happen
before directly adjusting this configuration parameter. The config is
documented here primarily to provide an understanding of what it means if/when
someone sees that it has been s [...]
|`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests
served by a task's chat handler. Set to 0 to disable limiting.|0|
+|`druid.indexing.formats.maxStringLength`|Maximum number of characters to
store per string dimension value. Longer values are truncated during ingestion.
Does not apply to multi-value string dimensions. Set to 0 to disable. Can be
overridden per-dimension using `maxStringLength` in the [dimension
object](../ingestion/ingestion-spec.md#dimension-objects).|0 (no truncation)|
If the Peon is running in remote mode, there must be an Overlord up and
running. Peons in remote mode can set the following configurations:
diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md
index fc8b96f19d8..72ec6d793d3 100644
--- a/docs/ingestion/ingestion-spec.md
+++ b/docs/ingestion/ingestion-spec.md
@@ -243,6 +243,7 @@ Dimension objects can have the following components:
| name | The name of the dimension. This will be used as the field name to
read from input records, as well as the column name stored in generated
segments.<br /><br />Note that you can use a [`transformSpec`](#transformspec)
if you want to rename columns during ingestion time. | none (required) |
| createBitmapIndex | For `string` typed dimensions, whether or not bitmap
indexes should be created for the column in generated segments. Creating a
bitmap index requires more storage, but speeds up certain kinds of filtering
(especially equality and prefix filtering). Only supported for `string` typed
dimensions. | `true` |
| multiValueHandling | For `string` typed dimensions, specifies the type of
handling for [multi-value fields](../querying/multi-value-dimensions.md).
Possible values are `array` (ingest string arrays as-is), `sorted_array` (sort
string arrays during ingestion), and `sorted_set` (sort and de-duplicate string
arrays during ingestion). This parameter is ignored for types other than
`string`. | `sorted_array` |
+| maxStringLength | For `string` typed dimensions, the maximum number of
characters to store per value. Longer values are truncated during ingestion.
Does not apply to multi-value string dimensions. Set to 0 to disable. Overrides
the global
[`druid.indexing.formats.maxStringLength`](../configuration/index.md#additional-peon-configuration)
property. | `0` (no truncation) |
#### Inclusions and exclusions
diff --git
a/processing/src/main/java/org/apache/druid/data/input/impl/StringDimensionSchema.java
b/processing/src/main/java/org/apache/druid/data/input/impl/StringDimensionSchema.java
index bd5314c636b..ab00952e867 100644
---
a/processing/src/main/java/org/apache/druid/data/input/impl/StringDimensionSchema.java
+++
b/processing/src/main/java/org/apache/druid/data/input/impl/StringDimensionSchema.java
@@ -21,15 +21,26 @@ package org.apache.druid.data.input.impl;
import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.guice.BuiltInTypesModule;
import org.apache.druid.segment.DimensionHandler;
import org.apache.druid.segment.StringDimensionHandler;
import org.apache.druid.segment.column.ColumnType;
+import javax.annotation.Nullable;
+
public class StringDimensionSchema extends DimensionSchema
{
private static final boolean DEFAULT_CREATE_BITMAP_INDEX = true;
+ public static int getDefaultMaxStringLength()
+ {
+ return BuiltInTypesModule.getMaxStringLength();
+ }
+
+ private final int maxStringLength;
+
@JsonCreator
public static StringDimensionSchema create(String name)
{
@@ -40,15 +51,33 @@ public class StringDimensionSchema extends DimensionSchema
public StringDimensionSchema(
@JsonProperty("name") String name,
@JsonProperty("multiValueHandling") MultiValueHandling
multiValueHandling,
- @JsonProperty("createBitmapIndex") Boolean createBitmapIndex
+ @JsonProperty("createBitmapIndex") Boolean createBitmapIndex,
+ @JsonProperty("maxStringLength") @Nullable Integer maxStringLength
)
{
super(name, multiValueHandling, createBitmapIndex == null ?
DEFAULT_CREATE_BITMAP_INDEX : createBitmapIndex);
+ this.maxStringLength = maxStringLength != null && maxStringLength > 0 ?
maxStringLength : getDefaultMaxStringLength();
+ }
+
+ public StringDimensionSchema(
+ String name,
+ MultiValueHandling multiValueHandling,
+ Boolean createBitmapIndex
+ )
+ {
+ this(name, multiValueHandling, createBitmapIndex,
getDefaultMaxStringLength());
}
public StringDimensionSchema(String name)
{
- this(name, null, DEFAULT_CREATE_BITMAP_INDEX);
+ this(name, null, DEFAULT_CREATE_BITMAP_INDEX, getDefaultMaxStringLength());
+ }
+
+ @JsonProperty
+ @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+ public int getMaxStringLength()
+ {
+ return maxStringLength;
}
@Override
@@ -73,6 +102,6 @@ public class StringDimensionSchema extends DimensionSchema
@Override
public DimensionHandler getDimensionHandler()
{
- return new StringDimensionHandler(getName(), getMultiValueHandling(),
hasBitmapIndex(), false);
+ return new StringDimensionHandler(getName(), getMultiValueHandling(),
hasBitmapIndex(), false, maxStringLength);
}
}
diff --git
a/processing/src/main/java/org/apache/druid/guice/BuiltInTypesModule.java
b/processing/src/main/java/org/apache/druid/guice/BuiltInTypesModule.java
index 71433b5cce4..e260a4bd8b6 100644
--- a/processing/src/main/java/org/apache/druid/guice/BuiltInTypesModule.java
+++ b/processing/src/main/java/org/apache/druid/guice/BuiltInTypesModule.java
@@ -53,6 +53,7 @@ public class BuiltInTypesModule implements DruidModule
*/
private static DimensionSchema.MultiValueHandling STRING_MV_MODE =
DimensionSchema.MultiValueHandling.SORTED_ARRAY;
private static IndexSpec DEFAULT_INDEX_SPEC = IndexSpec.builder().build();
+ private static int MAX_STRING_LENGTH = 0;
/**
* @return the configured string multi value handling mode from the system
config if set; otherwise, returns
@@ -89,6 +90,7 @@ public class BuiltInTypesModule implements DruidModule
public SideEffectRegisterer
initDimensionHandlerAndMvHandlingMode(DefaultColumnFormatConfig formatsConfig)
{
setStringMultiValueHandlingModeIfConfigured(formatsConfig.getStringMultiValueHandlingMode());
+ setMaxStringLengthIfConfigured(formatsConfig.getMaxStringLength());
setIndexSpecDefaults(formatsConfig.getIndexSpec());
setNestedColumnDefaults(formatsConfig);
@@ -128,6 +130,24 @@ public class BuiltInTypesModule implements DruidModule
}
}
+ private static void setMaxStringLengthIfConfigured(@Nullable Integer
maxStringLength)
+ {
+ if (maxStringLength != null) {
+ MAX_STRING_LENGTH = maxStringLength;
+ }
+ }
+
+ @VisibleForTesting
+ public static void setMaxStringLength(int maxStringLength)
+ {
+ MAX_STRING_LENGTH = maxStringLength;
+ }
+
+ public static int getMaxStringLength()
+ {
+ return MAX_STRING_LENGTH;
+ }
+
private static void setStringMultiValueHandlingModeIfConfigured(@Nullable
String stringMultiValueHandlingMode)
{
if (stringMultiValueHandlingMode != null) {
diff --git
a/processing/src/main/java/org/apache/druid/segment/DefaultColumnFormatConfig.java
b/processing/src/main/java/org/apache/druid/segment/DefaultColumnFormatConfig.java
index 210ec5c686b..19b875b5f6c 100644
---
a/processing/src/main/java/org/apache/druid/segment/DefaultColumnFormatConfig.java
+++
b/processing/src/main/java/org/apache/druid/segment/DefaultColumnFormatConfig.java
@@ -68,6 +68,21 @@ public class DefaultColumnFormatConfig
return stringMultiValueHandlingMode;
}
+ @Nullable
+ private static Integer validateMaxStringLength(@Nullable Integer
maxStringLength)
+ {
+ if (maxStringLength != null && maxStringLength <= 0) {
+ throw DruidException.forPersona(DruidException.Persona.OPERATOR)
+ .ofCategory(DruidException.Category.INVALID_INPUT)
+ .build(
+ "Invalid value[%s] specified for
'druid.indexing.formats.maxStringLength'."
+ + " Value must be a positive integer.",
+ maxStringLength
+ );
+ }
+ return maxStringLength;
+ }
+
@JsonProperty("stringMultiValueHandlingMode")
@Nullable
private final Integer nestedColumnFormatVersion;
@@ -80,11 +95,16 @@ public class DefaultColumnFormatConfig
@Nullable
private final IndexSpec indexSpec;
+ @JsonProperty("maxStringLength")
+ @Nullable
+ private final Integer maxStringLength;
+
@JsonCreator
public DefaultColumnFormatConfig(
@JsonProperty("stringMultiValueHandlingMode") @Nullable String
stringMultiValueHandlingMode,
@JsonProperty("nestedColumnFormatVersion") @Nullable Integer
nestedColumnFormatVersion,
- @JsonProperty("indexSpec") @Nullable IndexSpec indexSpec
+ @JsonProperty("indexSpec") @Nullable IndexSpec indexSpec,
+ @JsonProperty("maxStringLength") @Nullable Integer maxStringLength
)
{
validateMultiValueHandlingMode(stringMultiValueHandlingMode);
@@ -93,6 +113,7 @@ public class DefaultColumnFormatConfig
this.stringMultiValueHandlingMode =
validateMultiValueHandlingMode(stringMultiValueHandlingMode);
this.nestedColumnFormatVersion = nestedColumnFormatVersion;
this.indexSpec = indexSpec;
+ this.maxStringLength = validateMaxStringLength(maxStringLength);
}
@Nullable
@@ -116,6 +137,13 @@ public class DefaultColumnFormatConfig
return indexSpec;
}
+ @Nullable
+ @JsonProperty("maxStringLength")
+ public Integer getMaxStringLength()
+ {
+ return maxStringLength;
+ }
+
@Override
public boolean equals(Object o)
{
@@ -128,13 +156,14 @@ public class DefaultColumnFormatConfig
DefaultColumnFormatConfig that = (DefaultColumnFormatConfig) o;
return Objects.equals(nestedColumnFormatVersion,
that.nestedColumnFormatVersion)
&& Objects.equals(stringMultiValueHandlingMode,
that.stringMultiValueHandlingMode)
- && Objects.equals(indexSpec, that.indexSpec);
+ && Objects.equals(indexSpec, that.indexSpec)
+ && Objects.equals(maxStringLength, that.maxStringLength);
}
@Override
public int hashCode()
{
- return Objects.hash(nestedColumnFormatVersion,
stringMultiValueHandlingMode, indexSpec);
+ return Objects.hash(nestedColumnFormatVersion,
stringMultiValueHandlingMode, indexSpec, maxStringLength);
}
@Override
@@ -144,6 +173,7 @@ public class DefaultColumnFormatConfig
"stringMultiValueHandlingMode=" + stringMultiValueHandlingMode +
", nestedColumnFormatVersion=" + nestedColumnFormatVersion +
", indexSpec=" + indexSpec +
+ ", maxStringLength=" + maxStringLength +
'}';
}
}
diff --git
a/processing/src/main/java/org/apache/druid/segment/StringDimensionHandler.java
b/processing/src/main/java/org/apache/druid/segment/StringDimensionHandler.java
index f20ed3bbc1a..d2b41ab7a4b 100644
---
a/processing/src/main/java/org/apache/druid/segment/StringDimensionHandler.java
+++
b/processing/src/main/java/org/apache/druid/segment/StringDimensionHandler.java
@@ -104,6 +104,7 @@ public class StringDimensionHandler implements
DimensionHandler<Integer, int[],
private final MultiValueHandling multiValueHandling;
private final boolean hasBitmapIndexes;
private final boolean hasSpatialIndexes;
+ private final int maxStringLength;
public StringDimensionHandler(
String dimensionName,
@@ -111,11 +112,23 @@ public class StringDimensionHandler implements
DimensionHandler<Integer, int[],
boolean hasBitmapIndexes,
boolean hasSpatialIndexes
)
+ {
+ this(dimensionName, multiValueHandling, hasBitmapIndexes,
hasSpatialIndexes, StringDimensionSchema.getDefaultMaxStringLength());
+ }
+
+ public StringDimensionHandler(
+ String dimensionName,
+ MultiValueHandling multiValueHandling,
+ boolean hasBitmapIndexes,
+ boolean hasSpatialIndexes,
+ int maxStringLength
+ )
{
this.dimensionName = dimensionName;
this.multiValueHandling = multiValueHandling;
this.hasBitmapIndexes = hasBitmapIndexes;
this.hasSpatialIndexes = hasSpatialIndexes;
+ this.maxStringLength = maxStringLength;
}
@Override
@@ -160,7 +173,7 @@ public class StringDimensionHandler implements
DimensionHandler<Integer, int[],
@Override
public DimensionIndexer<Integer, int[], String> makeIndexer()
{
- return new StringDimensionIndexer(multiValueHandling, hasBitmapIndexes,
hasSpatialIndexes);
+ return new StringDimensionIndexer(multiValueHandling, hasBitmapIndexes,
hasSpatialIndexes, maxStringLength);
}
@Override
diff --git
a/processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
b/processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
index d1ce3cf48d0..d41fe6fea98 100644
---
a/processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
+++
b/processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
@@ -24,6 +24,7 @@ import it.unimi.dsi.fastutil.ints.IntArrays;
import org.apache.druid.collections.bitmap.BitmapFactory;
import org.apache.druid.collections.bitmap.MutableBitmap;
import org.apache.druid.data.input.impl.DimensionSchema.MultiValueHandling;
+import org.apache.druid.data.input.impl.StringDimensionSchema;
import org.apache.druid.error.DruidException;
import org.apache.druid.java.util.common.ISE;
import org.apache.druid.java.util.common.StringUtils;
@@ -57,6 +58,7 @@ public class StringDimensionIndexer extends
DictionaryEncodedColumnIndexer<int[]
private final MultiValueHandling multiValueHandling;
private final boolean hasBitmapIndexes;
private final boolean hasSpatialIndexes;
+ private final int maxStringLength;
private volatile boolean hasMultipleValues = false;
public StringDimensionIndexer(
@@ -64,11 +66,30 @@ public class StringDimensionIndexer extends
DictionaryEncodedColumnIndexer<int[]
boolean hasBitmapIndexes,
boolean hasSpatialIndexes
)
+ {
+ this(multiValueHandling, hasBitmapIndexes, hasSpatialIndexes,
StringDimensionSchema.getDefaultMaxStringLength());
+ }
+
+ public StringDimensionIndexer(
+ @Nullable MultiValueHandling multiValueHandling,
+ boolean hasBitmapIndexes,
+ boolean hasSpatialIndexes,
+ int maxStringLength
+ )
{
super(new StringDimensionDictionary());
this.multiValueHandling = multiValueHandling == null ?
MultiValueHandling.ofDefault() : multiValueHandling;
this.hasBitmapIndexes = hasBitmapIndexes;
this.hasSpatialIndexes = hasSpatialIndexes;
+ this.maxStringLength = maxStringLength;
+ }
+
+ private String truncateIfNeeded(String value)
+ {
+ if (maxStringLength > 0 && value != null && value.length() >
maxStringLength) {
+ return value.substring(0, maxStringLength);
+ }
+ return value;
}
@Override
@@ -92,7 +113,7 @@ public class StringDimensionIndexer extends
DictionaryEncodedColumnIndexer<int[]
dimLookup.add(null);
encodedDimensionValues = IntArrays.EMPTY_ARRAY;
} else if (dimValuesList.size() == 1) {
- encodedDimensionValues = new
int[]{dimLookup.add(Evals.asString(dimValuesList.get(0)))};
+ encodedDimensionValues = new
int[]{dimLookup.add(truncateIfNeeded(Evals.asString(dimValuesList.get(0))))};
} else {
hasMultipleValues = true;
final String[] dimensionValues = new String[dimValuesList.size()];
@@ -125,7 +146,7 @@ public class StringDimensionIndexer extends
DictionaryEncodedColumnIndexer<int[]
encodedDimensionValues =
new
int[]{dimLookup.add(Evals.asString(StringUtils.encodeBase64String((byte[])
dimValues)))};
} else {
- encodedDimensionValues = new
int[]{dimLookup.add(Evals.asString(dimValues))};
+ encodedDimensionValues = new
int[]{dimLookup.add(truncateIfNeeded(Evals.asString(dimValues)))};
}
// If dictionary size has changed, the sorted lookup is no longer valid.
diff --git
a/processing/src/test/java/org/apache/druid/data/input/impl/StringDimensionSchemaTest.java
b/processing/src/test/java/org/apache/druid/data/input/impl/StringDimensionSchemaTest.java
index cfc9006fe57..3354ac8b82a 100644
---
a/processing/src/test/java/org/apache/druid/data/input/impl/StringDimensionSchemaTest.java
+++
b/processing/src/test/java/org/apache/druid/data/input/impl/StringDimensionSchemaTest.java
@@ -54,9 +54,11 @@ public class StringDimensionSchemaTest
final String json = "{\n"
+ " \"name\" : \"dim\",\n"
+ " \"multiValueHandling\" : \"SORTED_SET\",\n"
- + " \"createBitmapIndex\" : false\n"
+ + " \"createBitmapIndex\" : false,\n"
+ + " \"maxStringLength\" : 200\n"
+ "}";
final StringDimensionSchema schema = (StringDimensionSchema)
jsonMapper.readValue(json, DimensionSchema.class);
Assert.assertEquals(new StringDimensionSchema("dim",
MultiValueHandling.SORTED_SET, false), schema);
+ Assert.assertEquals(200, schema.getMaxStringLength());
}
}
diff --git
a/processing/src/test/java/org/apache/druid/guice/BuiltInTypesModuleTest.java
b/processing/src/test/java/org/apache/druid/guice/BuiltInTypesModuleTest.java
index fc43eb2c5a6..189a8a2bdf3 100644
---
a/processing/src/test/java/org/apache/druid/guice/BuiltInTypesModuleTest.java
+++
b/processing/src/test/java/org/apache/druid/guice/BuiltInTypesModuleTest.java
@@ -33,9 +33,9 @@ import org.apache.druid.segment.data.CompressionStrategy;
import org.apache.druid.segment.data.ConciseBitmapSerdeFactory;
import org.apache.druid.segment.nested.NestedCommonFormatColumnFormatSpec;
import org.apache.druid.segment.nested.NestedDataComplexTypeSerde;
+import org.junit.After;
import org.junit.AfterClass;
import org.junit.Test;
-import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeAll;
@@ -56,10 +56,11 @@ public class BuiltInTypesModuleTest
DimensionHandlerUtils.DIMENSION_HANDLER_PROVIDERS.remove(NestedDataComplexTypeSerde.TYPE_NAME);
}
- @AfterEach
- public void beforeEach()
+ @After
+ public void teardownEach()
{
BuiltInTypesModule.setIndexSpecDefaults(IndexSpec.builder().build());
+ BuiltInTypesModule.setMaxStringLength(0);
}
@AfterClass
@@ -74,6 +75,7 @@ public class BuiltInTypesModuleTest
);
}
BuiltInTypesModule.setIndexSpecDefaults(IndexSpec.builder().build());
+ BuiltInTypesModule.setMaxStringLength(0);
}
@Test
@@ -95,6 +97,8 @@ public class BuiltInTypesModuleTest
DimensionSchema.MultiValueHandling.SORTED_ARRAY,
BuiltInTypesModule.getStringMultiValueHandlingMode()
);
+
+ Assertions.assertEquals(0, BuiltInTypesModule.getMaxStringLength());
}
@Test
@@ -174,6 +178,34 @@ public class BuiltInTypesModuleTest
));
}
+ @Test
+ public void testMaxStringLengthOverride()
+ {
+ final Properties props = new Properties();
+ props.setProperty("druid.indexing.formats.maxStringLength", "500");
+ final Injector gadget = makeInjector(props);
+
+ gadget.getInstance(BuiltInTypesModule.SideEffectRegisterer.class);
+
+ Assertions.assertEquals(500, BuiltInTypesModule.getMaxStringLength());
+ }
+
+ @Test
+ public void testInvalidMaxStringLength()
+ {
+ final Properties props = new Properties();
+ props.setProperty("druid.indexing.formats.maxStringLength", "-1");
+ final Injector gadget = makeInjector(props);
+
+ final Exception exception = Assertions.assertThrows(
+ Exception.class,
+ () -> gadget.getInstance(BuiltInTypesModule.SideEffectRegisterer.class)
+ );
+ Assertions.assertTrue(exception.getMessage().contains(
+ "Invalid value[-1] specified for
'druid.indexing.formats.maxStringLength'"
+ ));
+ }
+
private Injector makeInjector(Properties props)
{
diff --git
a/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
b/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
index 8d5e66589c3..cd6cd073ac7 100644
---
a/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
+++
b/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
@@ -76,7 +76,7 @@ public class NestedDataScanQueryTest extends
InitializedNullHandlingTest
{
private static final Logger LOG = new Logger(NestedDataScanQueryTest.class);
- DefaultColumnFormatConfig DEFAULT_FORMAT = new
DefaultColumnFormatConfig(null, null, null);
+ DefaultColumnFormatConfig DEFAULT_FORMAT = new
DefaultColumnFormatConfig(null, null, null, null);
@Rule
public final TemporaryFolder tempFolder = new TemporaryFolder();
diff --git
a/processing/src/test/java/org/apache/druid/segment/DefaultColumnFormatsConfigTest.java
b/processing/src/test/java/org/apache/druid/segment/DefaultColumnFormatsConfigTest.java
index 41af9742bfa..5c787e7666d 100644
---
a/processing/src/test/java/org/apache/druid/segment/DefaultColumnFormatsConfigTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/DefaultColumnFormatsConfigTest.java
@@ -34,7 +34,7 @@ public class DefaultColumnFormatsConfigTest
@Test
public void testDefaultsSerde() throws JsonProcessingException
{
- DefaultColumnFormatConfig defaultColumnFormatConfig = new
DefaultColumnFormatConfig(null, null, null);
+ DefaultColumnFormatConfig defaultColumnFormatConfig = new
DefaultColumnFormatConfig(null, null, null, null);
String there = MAPPER.writeValueAsString(defaultColumnFormatConfig);
DefaultColumnFormatConfig andBack = MAPPER.readValue(there,
DefaultColumnFormatConfig.class);
Assert.assertEquals(defaultColumnFormatConfig, andBack);
@@ -45,12 +45,13 @@ public class DefaultColumnFormatsConfigTest
@Test
public void testDefaultsSerdeOverride() throws JsonProcessingException
{
- DefaultColumnFormatConfig defaultColumnFormatConfig = new
DefaultColumnFormatConfig("ARRAY", 5, null);
+ DefaultColumnFormatConfig defaultColumnFormatConfig = new
DefaultColumnFormatConfig("ARRAY", 5, null, null);
String there = MAPPER.writeValueAsString(defaultColumnFormatConfig);
DefaultColumnFormatConfig andBack = MAPPER.readValue(there,
DefaultColumnFormatConfig.class);
Assert.assertEquals(defaultColumnFormatConfig, andBack);
Assert.assertEquals(5, (int) andBack.getNestedColumnFormatVersion());
Assert.assertEquals(DimensionSchema.MultiValueHandling.ARRAY.toString(),
andBack.getStringMultiValueHandlingMode());
+ Assert.assertNull(andBack.getMaxStringLength());
}
@Test
diff --git
a/processing/src/test/java/org/apache/druid/segment/NestedDataColumnSchemaTest.java
b/processing/src/test/java/org/apache/druid/segment/NestedDataColumnSchemaTest.java
index e627827960b..2be76c4e1f5 100644
---
a/processing/src/test/java/org/apache/druid/segment/NestedDataColumnSchemaTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/NestedDataColumnSchemaTest.java
@@ -33,7 +33,7 @@ import org.junit.Test;
public class NestedDataColumnSchemaTest
{
- private static final DefaultColumnFormatConfig DEFAULT_CONFIG = new
DefaultColumnFormatConfig(null, null, null);
+ private static final DefaultColumnFormatConfig DEFAULT_CONFIG = new
DefaultColumnFormatConfig(null, null, null, null);
private static final NestedCommonFormatColumnFormatSpec DEFAULT_NESTED_SPEC =
NestedCommonFormatColumnFormatSpec.builder()
.setObjectFieldsDictionaryEncoding(
@@ -47,7 +47,8 @@ public class NestedDataColumnSchemaTest
private static final DefaultColumnFormatConfig DEFAULT_NESTED_SPEC_CONFIG =
new DefaultColumnFormatConfig(
null,
null,
- IndexSpec.builder().withAutoColumnFormatSpec(DEFAULT_NESTED_SPEC).build()
+
IndexSpec.builder().withAutoColumnFormatSpec(DEFAULT_NESTED_SPEC).build(),
+ null
);
private static final ObjectMapper MAPPER;
diff --git
a/processing/src/test/java/org/apache/druid/segment/StringDimensionIndexerTest.java
b/processing/src/test/java/org/apache/druid/segment/StringDimensionIndexerTest.java
index 7e8adc577b0..da3361c155b 100644
---
a/processing/src/test/java/org/apache/druid/segment/StringDimensionIndexerTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/StringDimensionIndexerTest.java
@@ -26,6 +26,7 @@ import org.junit.Assert;
import org.junit.Test;
import java.util.Arrays;
+import java.util.Collections;
/**
* Unit tests for {@link StringDimensionIndexer}.
@@ -140,6 +141,63 @@ public class StringDimensionIndexerTest extends
InitializedNullHandlingTest
);
}
+ @Test
+ public void testTruncation()
+ {
+ final StringDimensionIndexer indexer = new StringDimensionIndexer(
+ DimensionSchema.MultiValueHandling.SORTED_ARRAY,
+ true,
+ false,
+ 5
+ );
+
+ EncodedKeyComponent<int[]> keyComponent =
indexer.processRowValsToUnsortedEncodedKeyComponent("abcdefghij", false);
+ Assert.assertEquals(
+ "abcde",
+
indexer.convertUnsortedEncodedKeyComponentToActualList(keyComponent.getComponent())
+ );
+ }
+
+ @Test
+ public void testSingleValueMvdTruncated()
+ {
+ final StringDimensionIndexer indexer = new StringDimensionIndexer(
+ DimensionSchema.MultiValueHandling.SORTED_ARRAY,
+ true,
+ false,
+ 5
+ );
+
+ EncodedKeyComponent<int[]> keyComponent =
indexer.processRowValsToUnsortedEncodedKeyComponent(
+ Collections.singletonList("abcdefghij"),
+ false
+ );
+ Assert.assertEquals(
+ "abcde",
+
indexer.convertUnsortedEncodedKeyComponentToActualList(keyComponent.getComponent())
+ );
+ }
+
+ @Test
+ public void testMultiValueNotTruncated()
+ {
+ final StringDimensionIndexer indexer = new StringDimensionIndexer(
+ DimensionSchema.MultiValueHandling.SORTED_ARRAY,
+ true,
+ false,
+ 5
+ );
+
+ EncodedKeyComponent<int[]> keyComponent =
indexer.processRowValsToUnsortedEncodedKeyComponent(
+ Arrays.asList("abcdefghij", "klmnopqrst"),
+ false
+ );
+ Assert.assertEquals(
+ Arrays.asList("abcdefghij", "klmnopqrst"),
+
indexer.convertUnsortedEncodedKeyComponentToActualList(keyComponent.getComponent())
+ );
+ }
+
private long verifyEncodedValues(
StringDimensionIndexer indexer,
Object dimensionValues,
diff --git a/services/src/test/java/org/apache/druid/cli/DumpSegmentTest.java
b/services/src/test/java/org/apache/druid/cli/DumpSegmentTest.java
index 8db258c7f53..c0d86b3e8b0 100644
--- a/services/src/test/java/org/apache/druid/cli/DumpSegmentTest.java
+++ b/services/src/test/java/org/apache/druid/cli/DumpSegmentTest.java
@@ -132,10 +132,10 @@ public class DumpSegmentTest extends
InitializedNullHandlingTest
new InjectableValues.Std()
.addValue(ExprMacroTable.class.getName(),
TestExprMacroTable.INSTANCE)
.addValue(ObjectMapper.class.getName(), mapper)
- .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null))
+ .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null, null))
);
Mockito.when(injector.getInstance(Key.get(ObjectMapper.class,
Json.class))).thenReturn(mapper);
-
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null));
+
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null, null));
List<Segment> segments = createSegments(tempFolder, closer);
QueryableIndex queryableIndex = segments.get(0).as(QueryableIndex.class);
@@ -206,10 +206,10 @@ public class DumpSegmentTest extends
InitializedNullHandlingTest
new InjectableValues.Std()
.addValue(ExprMacroTable.class.getName(),
TestExprMacroTable.INSTANCE)
.addValue(ObjectMapper.class.getName(), mapper)
- .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null))
+ .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null, null))
);
Mockito.when(injector.getInstance(Key.get(ObjectMapper.class,
Json.class))).thenReturn(mapper);
-
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null));
+
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null, null));
List<Segment> segments = createSegments(tempFolder, closer);
QueryableIndex queryableIndex = segments.get(0).as(QueryableIndex.class);
@@ -239,10 +239,10 @@ public class DumpSegmentTest extends
InitializedNullHandlingTest
new InjectableValues.Std()
.addValue(ExprMacroTable.class.getName(),
TestExprMacroTable.INSTANCE)
.addValue(ObjectMapper.class.getName(), mapper)
- .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null))
+ .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null, null))
);
Mockito.when(injector.getInstance(Key.get(ObjectMapper.class,
Json.class))).thenReturn(mapper);
-
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null));
+
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null, null));
List<Segment> segments = createSegments(tempFolder, closer);
QueryableIndex queryableIndex = segments.get(0).as(QueryableIndex.class);
@@ -288,10 +288,10 @@ public class DumpSegmentTest extends
InitializedNullHandlingTest
new InjectableValues.Std()
.addValue(ExprMacroTable.class.getName(),
TestExprMacroTable.INSTANCE)
.addValue(ObjectMapper.class.getName(), mapper)
- .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null))
+ .addValue(DefaultColumnFormatConfig.class, new
DefaultColumnFormatConfig(null, null, null, null))
);
Mockito.when(injector.getInstance(Key.get(ObjectMapper.class,
Json.class))).thenReturn(mapper);
-
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null));
+
Mockito.when(injector.getInstance(DefaultColumnFormatConfig.class)).thenReturn(new
DefaultColumnFormatConfig(null, null, null, null));
File f = buildV10Segment();
diff --git a/website/.spelling b/website/.spelling
index f0c7e5cf609..544e5953b3f 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -427,6 +427,7 @@ maxBytes
maxNumericInFilters
maxNumFiles
maxNumSegments
+maxStringLength
max_map_count
memcached
mergeable
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]