gianm commented on a change in pull request #10248:
URL: https://github.com/apache/druid/pull/10248#discussion_r468270282
##########
File path:
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -73,16 +73,16 @@ private DimensionHandlerUtils()
)
{
if (capabilities == null) {
- return new StringDimensionHandler(dimensionName, multiValueHandling,
true);
+ return new StringDimensionHandler(dimensionName, multiValueHandling,
true, false);
}
multiValueHandling = multiValueHandling == null ?
MultiValueHandling.ofDefault() : multiValueHandling;
if (capabilities.getType() == ValueType.STRING) {
- if (!capabilities.isDictionaryEncoded()) {
+ if (!capabilities.isDictionaryEncoded().isMaybeTrue()) {
Review comment:
> This is because unknown would only happen from string dimension
indexers of the incremental index which ultimately always produce string
columns with a dictionary, and these handlers this method are making are used
for things like index merging, etc, and so could safely treat unknown from the
dim indexer as true because it wants the desired state rather than the current
state.
Thanks for explaining it. This sounds brittle for the following reason:
1. At the call site here, the check really does mean to require a dictionary.
2. So logically it should treat unknowns as false (to be safe — if it's not
sure if a dictionary exists, and it requires one, it should fail the check).
3. But it sounds like there are cases where an unknown will appear, and we
"know" that it is really meant to be a true. So we can treat unknowns as OK.
But what if a new case appears in a future patch that isn't (3)? Then this
check is too lax. And because the new case is likely to come from a new call
site of `getHandlerFromCapabilities`, whoever adds it will probably not realize
they need to update the check here.
So instead, would it work to keep the check strict, and resolving unknowns
to trues at the current call sites? Then, if a new caller is added, it'd have
to deal with the ambiguity.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]