aho135 commented on code in PR #19146:
URL: https://github.com/apache/druid/pull/19146#discussion_r2966999976
##########
processing/src/test/java/org/apache/druid/segment/StringDimensionIndexerTest.java:
##########
@@ -140,6 +140,43 @@ public void testBinaryInputs()
);
}
+ @Test
+ public void testTruncation()
+ {
+ final StringDimensionIndexer indexer = new StringDimensionIndexer(
+ DimensionSchema.MultiValueHandling.SORTED_ARRAY,
+ true,
+ false,
+ 5
+ );
+
+ EncodedKeyComponent<int[]> keyComponent =
indexer.processRowValsToUnsortedEncodedKeyComponent("abcdefghij", false);
+ Assert.assertEquals(
+ "abcde",
+
indexer.convertUnsortedEncodedKeyComponentToActualList(keyComponent.getComponent())
+ );
+ }
+
+ @Test
+ public void testMultiValueNotTruncated()
+ {
+ final StringDimensionIndexer indexer = new StringDimensionIndexer(
+ DimensionSchema.MultiValueHandling.SORTED_ARRAY,
+ true,
+ false,
+ 5
+ );
+
+ EncodedKeyComponent<int[]> keyComponent =
indexer.processRowValsToUnsortedEncodedKeyComponent(
+ Arrays.asList("abcdefghij", "klmnopqrst"),
+ false
+ );
+ Assert.assertEquals(
+ Arrays.asList("abcdefghij", "klmnopqrst"),
+
indexer.convertUnsortedEncodedKeyComponentToActualList(keyComponent.getComponent())
+ );
+ }
+
private long verifyEncodedValues(
Review Comment:
Ahh okay. I see we're truncating for MVD's with a single value. Would be
good to add a test case for that also if that's the intended behavior
##########
docs/configuration/index.md:
##########
@@ -1431,6 +1431,7 @@ Additional Peon configs include:
|`druid.indexer.task.tmpStorageBytesPerTask`|Maximum number of bytes per task
to be used to store temporary files on disk. This config is generally intended
for internal usage. Attempts to set it are very likely to be overwritten by the
TaskRunner that executes the task, so be sure of what you expect to happen
before directly adjusting this configuration parameter. The config is
documented here primarily to provide an understanding of what it means if/when
someone sees that it has been set. A value of -1 disables this limit. |-1|
|`druid.indexer.task.allowHadoopTaskExecution`|Conditional dictating if the
cluster allows `index_hadoop` tasks to be executed. `index_hadoop` is
deprecated, and defaulting to false will force cluster operators to acknowledge
the deprecation and consciously opt in to using index_hadoop with the
understanding that it will be removed in the future.|false|
|`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests
served by a task's chat handler. Set to 0 to disable limiting.|0|
+|`druid.indexing.formats.maxStringLength`|Maximum number of characters to
store per string dimension value. Longer values are truncated during ingestion.
Set to 0 to disable. Can be overridden per-dimension using `maxStringLength` in
the [dimension object](../ingestion/ingestion-spec.md#dimension-objects).|0 (no
truncation)|
Review Comment:
It would be good to mention that truncation does not apply for MVD's
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]