gianm commented on code in PR #18944:
URL: https://github.com/apache/druid/pull/18944#discussion_r2719884485
##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/InputRowSerde.java:
##########
@@ -372,9 +372,9 @@ private static void writeBytes(@Nullable byte[] value,
ByteArrayDataOutput out)
}
}
- private static void writeString(String value, ByteArrayDataOutput out)
throws IOException
+ private static void writeString(@Nullable String value, ByteArrayDataOutput
out) throws IOException
{
- writeBytes(StringUtils.toUtf8(value), out);
+ writeBytes(value == null ? null : StringUtils.toUtf8(value), out);
Review Comment:
could use `StringUtils.toUtf8Nullable`
##########
processing/src/main/java/org/apache/druid/data/input/Rows.java:
##########
@@ -74,7 +76,7 @@ public static List<String> objectToStrings(final Object
inputValue)
return Collections.emptyList();
} else if (inputValue instanceof List) {
// guava's toString function fails on null objects, so please do not use
it
- return ((List<?>)
inputValue).stream().map(String::valueOf).collect(Collectors.toList());
+ return ((List<?>) inputValue).stream().map(v -> v == null ? null :
String.valueOf(v)).collect(Collectors.toList());
Review Comment:
Should change the `Object[]` path too. Also, could use `Evals::asString`.
##########
indexing-hadoop/src/test/java/org/apache/druid/indexer/InputRowSerdeTest.java:
##########
@@ -281,4 +281,180 @@ public void testDimensionNullOrDefaultForNumerics()
Assert.assertEquals(result[52], TypeStrategies.IS_NULL_BYTE);
Assert.assertEquals(result[56], TypeStrategies.IS_NULL_BYTE);
}
+
+
+ @Test
+ public void testMultiValueDimensionWithNulls()
+ {
+ HashMap<String, Object> eventWithNullInMultiValue = new HashMap<>();
+ eventWithNullInMultiValue.put("d1", Arrays.asList("a", "b", null));
+
+ InputRow in = new MapBasedInputRow(
+ timestamp,
+ ImmutableList.of("d1"),
+ eventWithNullInMultiValue
+ );
+
+ DimensionsSpec dimensionsSpec = new DimensionsSpec(
+ Collections.singletonList(new StringDimensionSchema("d1"))
+ );
+
+ byte[] data = InputRowSerde.toBytes(
+ InputRowSerde.getTypeHelperMap(dimensionsSpec),
+ in,
+ new AggregatorFactory[0]
+ ).getSerializedRow();
+
+ InputRow out = InputRowSerde.fromBytes(
+ InputRowSerde.getTypeHelperMap(dimensionsSpec),
+ data,
+ new AggregatorFactory[0]
+ );
+
+ // Null values in multi-value dimensions should be preserved (matching
native batch behavior)
+ List<String> dimValues = out.getDimension("d1");
+ Assert.assertEquals(3, dimValues.size());
+ Assert.assertTrue(dimValues.contains("a"));
+ Assert.assertTrue(dimValues.contains("b"));
+ Assert.assertTrue(dimValues.contains(null)); // null is preserved
+ Assert.assertFalse(dimValues.contains("null"));
+ }
+
+ @Test
+ public void testMultiValueDimensionWithNulls_sqlCompatibleMode()
Review Comment:
I guess this is talking about SQL-compatible null-handling mode, which is no
longer a thing, since we're now always in what used to be called
"SQL-compatible null-handling mode". The references to that being a specific
mode should be removed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]