clintropolis commented on code in PR #18944:
URL: https://github.com/apache/druid/pull/18944#discussion_r2722981595
##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/HadoopDruidIndexerConfig.java:
##########
@@ -129,10 +132,17 @@ public class HadoopDruidIndexerConfig
);
JSON_MAPPER = INJECTOR.getInstance(ObjectMapper.class);
INDEX_IO = INJECTOR.getInstance(IndexIO.class);
- INDEX_MERGER_V9 = INJECTOR.getInstance(IndexMergerV9.class);
HADOOP_KERBEROS_CONFIG = INJECTOR.getInstance(HadoopKerberosConfig.class);
DATA_SEGMENT_PUSHER = INJECTOR.getInstance(DataSegmentPusher.class);
PROPERTIES = INJECTOR.getInstance(Properties.class);
+
+ boolean buildV10 =
Boolean.parseBoolean(PROPERTIES.getProperty(BUILD_V10_KEY, "false"));
Review Comment:
neat, :+1: I ignored hadoop for v10 format stuff, thanks for adding
##########
processing/src/main/java/org/apache/druid/data/input/Rows.java:
##########
@@ -74,15 +77,15 @@ public static List<String> objectToStrings(final Object
inputValue)
return Collections.emptyList();
} else if (inputValue instanceof List) {
// guava's toString function fails on null objects, so please do not use
it
- return ((List<?>)
inputValue).stream().map(String::valueOf).collect(Collectors.toList());
+ return ((List<?>)
inputValue).stream().map(Evals::asString).collect(Collectors.toList());
Review Comment:
i think this is a good change because the old behavior was wack, but I'm
still tracing through to try to determine the actual impacts of this change.
Besides the hadoop impact, which you fix in this PR, this method seems like
it will mostly impact callers of `Row.getDimension` as well as the `toGroupKey`
method of this class since it calls `getDimension`.
luckily there are a relatively small number of 'production' callers of these
methods
`Row.getDimension`:
<img width="831" height="867" alt="Image"
src="https://github.com/user-attachments/assets/5d04de7d-31d3-4358-9ef5-625fb0ee0472"
/>
`Rows.toGroupKey`:
<img width="897" height="697" alt="Image"
src="https://github.com/user-attachments/assets/bc92592a-4f7f-4806-858b-634f6cc94cfc"
/>
which look mostly related to partitioning. I think we need to determine if
the `null` -> `'null'` coercion is important for these callers, and if so, do
the coercion there. I'm uncertain currently but will keep trying to figure it
out.
##########
extensions-core/avro-extensions/src/test/java/org/apache/druid/data/input/AvroStreamInputRowParserTest.java:
##########
@@ -366,8 +367,9 @@ static void assertInputRowCorrect(InputRow inputRow,
List<String> expectedDimens
Lists.transform(SOME_INT_ARRAY_VALUE, String::valueOf),
inputRow.getDimension("someIntArray")
);
+ // For string array, nulls are preserved so use ArrayList (ImmutableList
doesn't support nulls)
Assert.assertEquals(
- Lists.transform(SOME_STRING_ARRAY_VALUE, String::valueOf),
+ SOME_STRING_ARRAY_VALUE.stream().map(v -> v == null ? null :
String.valueOf(v)).collect(Collectors.toList()),
Review Comment:
nit: this can use `Evals.asString`
##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/InputRowSerde.java:
##########
@@ -389,15 +389,20 @@ private static void writeStringArray(List<String> values,
ByteArrayDataOutput ou
}
}
+ @Nullable
private static String readString(DataInput in) throws IOException
{
byte[] result = readBytes(in);
- return StringUtils.fromUtf8(result);
+ return result == null ? null : StringUtils.fromUtf8(result);
Review Comment:
i wonder if we should make a `StringUtils.fromUtf8Nullable` that takes
`byte[]` (we already have one for `ByteBuffer`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]