gianm commented on a change in pull request #10219:
URL: https://github.com/apache/druid/pull/10219#discussion_r466140881



##########
File path: 
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -100,6 +100,11 @@ public long numRows(Segment segment)
 
     Map<String, ColumnAnalysis> columns = new TreeMap<>();
 
+    Function<String, ColumnCapabilities> adapterCapabilitesFn =

Review comment:
       I realize you just copied this code from somewhere else, but is there a 
way we can do this without `instanceof`? Maybe a new method on StorageAdapter 
with some nice javadocs? This code is pretty brittle otherwise.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedColumnPartSerde.java
##########
@@ -329,6 +331,7 @@ public void read(ByteBuffer buffer, ColumnBuilder builder, 
ColumnConfig columnCo
         );
         builder
             .setHasMultipleValues(hasMultipleValues)
+            .setNullable(firstDictionaryEntry == null)

Review comment:
       What if it's a multi-value column with no explicit nulls, but some empty 
rows? Two questions in that scenario:
   
   1. Should a column like that report "yes" or "no" for having nulls? I'm 
guessing "yes" makes sense, because if you filter on `col is null` then that 
should match empty rows. So it stands to reason that it has nulls in some 
sense, even though you won't see them if you walk through the column.
   2. What will firstDictionaryEntry be?

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java
##########
@@ -72,22 +74,35 @@ public static ColumnCapabilitiesImpl snapshot(@Nullable 
final ColumnCapabilities
     copy.hasMultipleValues = 
copy.hasMultipleValues.coerceUnknownToBoolean(unknownIsTrue);
     copy.dictionaryValuesSorted = 
copy.dictionaryValuesSorted.coerceUnknownToBoolean(unknownIsTrue);
     copy.dictionaryValuesUnique = 
copy.dictionaryValuesUnique.coerceUnknownToBoolean(unknownIsTrue);
+    copy.nullable = copy.nullable.coerceUnknownToBoolean(unknownIsTrue);

Review comment:
       With this addition, I'm wondering if the `snapshot` method still makes 
sense. What reason is there that callers generally want to set all these 
unknown things to the same boolean? They don't seem incredibly related. Maybe 
we should have callers explicitly resolve all the unknowns in particular 
directions.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/serde/DoubleNumericColumnPartSerdeV2.java
##########
@@ -152,13 +152,17 @@ public Deserializer getDeserializer()
 
       buffer.position(initialPos + offset);
       final ImmutableBitmap bitmap;
+      final boolean isNullable;
       if (buffer.hasRemaining()) {
         bitmap = 
bitmapSerdeFactory.getObjectStrategy().fromByteBufferWithSize(buffer);
+        isNullable = bitmap.size() > 0;

Review comment:
       It would be better to use `!bitmap.isEmpty()`, because `bitmap.size()` 
isn't always cached and could potentially require walking the bitmap. Similar 
comment for the other serde types.

##########
File path: 
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -192,6 +193,7 @@ private ColumnAnalysis analyzeNumericColumn(
     return new ColumnAnalysis(
         capabilities.getType().name(),
         capabilities.hasMultipleValues().isTrue(),
+        capabilities.isNullable().isTrue(),

Review comment:
       When would it be unknown if a column has nulls?
   
   Will bad stuff happen if we return `false` when it actually does have nulls, 
but for some reason it's marked as unknown?

##########
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -685,6 +685,9 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow 
row)
         }
         dimsKeySize += indexer.estimateEncodedKeyComponentSize(dimsKey);
         // Set column capabilities as data is coming in
+        if (dimsKey == null) {
+          capabilities.setIsNullable(NullHandling.sqlCompatible());

Review comment:
       If it's string won't it still count as a null, even in 
replace-with-default mode?
   
   Btw, consider adding a `NullHandling.isNullable(ValueType)` method if seems 
useful.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilities.java
##########
@@ -38,6 +38,7 @@
   boolean hasSpatialIndexes();
   Capable hasMultipleValues();
   boolean isFilterable();
+  Capable isNullable();

Review comment:
       Broad comment: "hasNulls" would be a better name for this property, 
because it's meant to mean whether or not the column really has nulls in it 
right now.
   
   "isNullable" will make people think of the SQL sense of the term, which is a 
different concept: it means a column could have nulls potentially, even if it 
doesn't right now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to