Re: [PR] improve exception message for mismatched types when merging CapabilitiesBasedFormat (druid)

via GitHub Wed, 26 Feb 2025 13:30:47 -0800


clintropolis commented on code in PR #17760:
URL: https://github.com/apache/druid/pull/17760#discussion_r1972443188



##########
processing/src/main/java/org/apache/druid/segment/column/CapabilitiesBasedFormat.java:
##########
@@ -111,18 +110,18 @@ public ColumnFormat merge(@Nullable ColumnFormat 
otherFormat)
     } else if (!Objects.equals(merged.getComplexTypeName(), 
otherSnapshot.getComplexTypeName())) {
       throw new ISE(
           "Cannot merge columns of type[%s] and [%s]",
-          merged.getComplexTypeName(),
-          otherSnapshot.getComplexTypeName()
+          mergedType,
+          otherType
       );
     }
 
     
merged.setDictionaryEncoded(merged.isDictionaryEncoded().or(otherSnapshot.isDictionaryEncoded()).isTrue());
     
merged.setHasMultipleValues(merged.hasMultipleValues().or(otherSnapshot.hasMultipleValues()).isTrue());
     merged.setDictionaryValuesSorted(
-        
merged.areDictionaryValuesSorted().and(otherSnapshot.areDictionaryValuesSorted()).isTrue()
+        
merged.areDictionaryValuesSorted().or(otherSnapshot.areDictionaryValuesSorted()).isTrue()

Review Comment:
   I think what I _really_ need to do is update the javadoc on 
`CapabilitiesBasedFormat` to indicate that it's purpose is to capture the 
'legacy' column formats that previously relied on merging `ColumnCapabilities` 
to work correctly, which in this case the change in this PR doesn't matter 
because it doesn't affect anything because these 2 properties are not used at 
all for decisions persisting stuff since they are just computed when the 
segment is loaded.
   
   > if the merging always sorts regardless of input sortedness, it should just 
be hard coded to TRUE
   
   This is probably the most correct interpretation of the existing behavior 
for the things that rely on this `ColumnFormat` though it would be kind of odd 
to hard code to true because these settings are only applicable to STRING typed 
columns, which is why they are all using 'OR' logic instead of 'AND'.
   
   Everything going forward should just implement `ColumnFormat` directly to 
capture what makes sense for the columns physical storage details, which would 
be the appropriate place to decide how to merge differing physical formats 
instead of using this capabilities based fallback thingy, so I should add 
javadoc to make this clear that `CapabilitiesBasedFormat` is not a "good" 
implementation of `ColumnFormat`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] improve exception message for mismatched types when merging CapabilitiesBasedFormat (druid)

Reply via email to