gianm commented on a change in pull request #10219:
URL: https://github.com/apache/druid/pull/10219#discussion_r469592335



##########
File path: 
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -335,6 +339,8 @@ private ColumnAnalysis analyzeComplexColumn(
   {
     try (final ComplexColumn complexColumn = columnHolder != null ? 
(ComplexColumn) columnHolder.getColumn() : null) {
       final boolean hasMultipleValues = capabilities != null && 
capabilities.hasMultipleValues().isTrue();
+      // if we don't know for sure, then we should plan to check for nulls
+      final boolean nullable = capabilities != null && 
capabilities.hasNulls().isMaybeTrue();

Review comment:
       Could you please rename this to `hasNulls`?

##########
File path: 
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -192,6 +193,7 @@ private ColumnAnalysis analyzeNumericColumn(
     return new ColumnAnalysis(
         capabilities.getType().name(),
         capabilities.hasMultipleValues().isTrue(),
+        capabilities.hasNulls().isMaybeTrue(), // if we don't know for sure, 
then we should plan to check for nulls

Review comment:
       This comment doesn't make a lot of sense in the context of 
SegmentAnalyzer. It shouldn't "know" what the fields are being used for, so it 
shouldn't "know" they're going to be used to check for nulls somewhere higher 
up the chain.
   
   Maybe a better thing to do would be:
   
   1) Define "hasNulls" as meaning "definitely has nulls, or can't determine". 
This should be in the docs for the segmentMetadata query too.
   2) Now this comment could read: "If we don't know for sure, return true to 
adhere to the definition of hasNulls"

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilities.java
##########
@@ -38,6 +38,7 @@
   boolean hasSpatialIndexes();
   Capable hasMultipleValues();
   boolean isFilterable();
+  Capable isNullable();

Review comment:
       I do feel strongly that it shouldn't be called `isNullable`. I think we 
should avoid the word "nullable" in core Druid to refer to this concept, 
because one day, we might want to introduce the SQL-style concept of nullable 
vs non-nullable columns. That concept is more of a definition thing than a 
reality thing, and will be confusing to introduce to core if we have the 
reality-thing using the word "nullable".
   
   Also, if we did introduce it to core, we'd probably want to report both (a 
column that is nullable table-wide might not actually have nulls in a specific 
segment, and that's useful to know).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to