clintropolis commented on code in PR #17760:
URL: https://github.com/apache/druid/pull/17760#discussion_r1972443188
##########
processing/src/main/java/org/apache/druid/segment/column/CapabilitiesBasedFormat.java:
##########
@@ -111,18 +110,18 @@ public ColumnFormat merge(@Nullable ColumnFormat
otherFormat)
} else if (!Objects.equals(merged.getComplexTypeName(),
otherSnapshot.getComplexTypeName())) {
throw new ISE(
"Cannot merge columns of type[%s] and [%s]",
- merged.getComplexTypeName(),
- otherSnapshot.getComplexTypeName()
+ mergedType,
+ otherType
);
}
merged.setDictionaryEncoded(merged.isDictionaryEncoded().or(otherSnapshot.isDictionaryEncoded()).isTrue());
merged.setHasMultipleValues(merged.hasMultipleValues().or(otherSnapshot.hasMultipleValues()).isTrue());
merged.setDictionaryValuesSorted(
-
merged.areDictionaryValuesSorted().and(otherSnapshot.areDictionaryValuesSorted()).isTrue()
+
merged.areDictionaryValuesSorted().or(otherSnapshot.areDictionaryValuesSorted()).isTrue()
Review Comment:
I think what I _really_ need to do is update the javadoc on
`CapabilitiesBasedFormat` to indicate that it's purpose is to capture the
'legacy' column formats that previously relied on merging `ColumnCapabilities`
to work correctly, which in this case the change in this PR doesn't matter
because it doesn't affect anything because these 2 properties are not used at
all for decisions persisting stuff since they are just computed when the
segment is loaded.
> if the merging always sorts regardless of input sortedness, it should just
be hard coded to TRUE
This is probably the most correct interpretation of the existing behavior
for the things that rely on this `ColumnFormat` though it would be kind of odd
to hard code to true because these settings are only applicable to STRING typed
columns, which is why they are all using 'OR' logic instead of 'AND'.
Everything going forward should just implement `ColumnFormat` directly to
capture what makes sense for the columns physical storage details, which would
be the appropriate place to decide how to merge differing physical formats
instead of using this capabilities based fallback thingy, so I should add
javadoc to make this clear that `CapabilitiesBasedFormat` is not a "good"
implementation of `ColumnFormat`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]