This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
     new 01d555e47b Adjust "in" filter null behavior to match "selector". 
(#12863)
01d555e47b is described below

commit 01d555e47bdd87419242a9327b16cfbe017bcfa4
Author: Gian Merlino <[email protected]>
AuthorDate: Mon Aug 8 09:08:36 2022 -0700

    Adjust "in" filter null behavior to match "selector". (#12863)
    
    * Adjust "in" filter null behavior to match "selector".
    
    Now, both of them match numeric nulls if constructed with a "null" value.
    
    This is consistent as far as native execution goes, but doesn't match
    the behavior of SQL = and IN. So, to address that, this patch also
    updates the docs to clarify that the native filters do match nulls.
    
    This patch also updates the SQL docs to describe how Boolean logic is
    handled in addition to how NULL values are handled.
    
    Fixes #12856.
    
    * Fix test.
---
 docs/querying/filters.md                           | 15 ++++--
 docs/querying/sql-data-types.md                    | 39 +++++++++-----
 .../org/apache/druid/query/filter/InDimFilter.java | 48 +++++++++++++++--
 .../apache/druid/segment/filter/InFilterTest.java  | 63 ++++++++++++++--------
 .../apache/druid/sql/calcite/CalciteQueryTest.java | 32 +++++++++++
 5 files changed, 155 insertions(+), 42 deletions(-)

diff --git a/docs/querying/filters.md b/docs/querying/filters.md
index b73270becd..f243ebb411 100644
--- a/docs/querying/filters.md
+++ b/docs/querying/filters.md
@@ -43,7 +43,8 @@ The grammar for a SELECTOR filter is as follows:
 "filter": { "type": "selector", "dimension": <dimension_string>, "value": 
<dimension_value_string> }
 ```
 
-This is the equivalent of `WHERE <dimension_string> = 
'<dimension_value_string>'`.
+This is the equivalent of `WHERE <dimension_string> = 
'<dimension_value_string>'` or `WHERE <dimension_string> IS NULL`
+(if the `value` is `null`).
 
 The selector filter supports the use of extraction functions, see [Filtering 
with Extraction Functions](#filtering-with-extraction-functions) for details.
 
@@ -226,7 +227,7 @@ In filter can be used to express the following SQL query:
  SELECT COUNT(*) AS 'Count' FROM `table` WHERE `outlaw` IN ('Good', 'Bad', 
'Ugly')
 ```
 
-The grammar for a IN filter is as follows:
+The grammar for a "in" filter is as follows:
 
 ```json
 {
@@ -236,12 +237,16 @@ The grammar for a IN filter is as follows:
 }
 ```
 
-The IN filter supports the use of extraction functions, see [Filtering with 
Extraction Functions](#filtering-with-extraction-functions) for details.
+The "in" filter supports the use of extraction functions, see [Filtering with 
Extraction Functions](#filtering-with-extraction-functions) for details.
 
-If an empty `values` array is passed to the IN filter, it will simply return 
an empty result.
-If the `dimension` is a multi-valued dimension, the IN filter will return true 
if one of the dimension values is
+If an empty `values` array is passed to the "in" filter, it will simply return 
an empty result.
+
+If the `dimension` is a multi-valued dimension, the "in" filter will return 
true if one of the dimension values is
 in the `values` array.
 
+If the `values` array contains `null`, the "in" filter matches null values. 
This differs from the SQL IN filter, which
+does not match NULL values.
+
 ## Like filter
 
 Like filters can be used for basic wildcard searches. They are equivalent to 
the SQL LIKE operator. Special characters
diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md
index 90eb7bb167..693a6b6604 100644
--- a/docs/querying/sql-data-types.md
+++ b/docs/querying/sql-data-types.md
@@ -85,17 +85,30 @@ the `UNNEST` functionality available in some other SQL 
dialects. Refer to the do
 
 ## NULL values
 
-The `druid.generic.useDefaultValueForNull` [runtime 
property](../configuration/index.md#sql-compatible-null-handling)
-controls Druid's NULL handling mode.
-
-In the default value mode (`true`), Druid treats NULLs and empty strings 
interchangeably, rather than according to the SQL
-standard. In this mode Druid SQL only has partial support for NULLs. For 
example, the expressions `col IS NULL` and
-`col = ''` are equivalent, and both will evaluate to true if `col` contains an 
empty string. Similarly, the expression
-`COALESCE(col1, col2)` will return `col2` if `col1` is an empty string. While 
the `COUNT(*)` aggregator counts all rows,
-the `COUNT(expr)` aggregator will count the number of rows where `expr` is 
neither null nor the empty string. Numeric
-columns in this mode are not nullable; any null or missing values will be 
treated as zeroes.
-
-In SQL compatible mode (`false`), NULLs are treated more closely to the SQL 
standard. The property affects both storage
-and querying, so for correct behavior, it should be set on all Druid service 
types to be available at both ingestion
-time and query time. There is some overhead associated with the ability to 
handle NULLs; see
+The 
[`druid.generic.useDefaultValueForNull`](../configuration/index.md#sql-compatible-null-handling)
+runtime property controls Druid's NULL handling mode. For the most SQL 
compliant behavior, set this to `false`.
+
+When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid 
treats NULLs and empty strings
+interchangeably, rather than according to the SQL standard. In this mode Druid 
SQL only has partial support for NULLs.
+For example, the expressions `col IS NULL` and `col = ''` are equivalent, and 
both will evaluate to true if `col`
+contains an empty string. Similarly, the expression `COALESCE(col1, col2)` 
will return `col2` if `col1` is an empty
+string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)` 
aggregator will count the number of rows
+where `expr` is neither null nor the empty string. Numeric columns in this 
mode are not nullable; any null or missing
+values will be treated as zeroes.
+
+When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more 
closely to the SQL standard. In this mode,
+numeric NULL is permitted, and NULLs and empty strings are no longer treated 
as interchangeable. This property
+affects both storage and querying, and must be set on all Druid service types 
to be available at both ingestion time
+and query time. There is some overhead associated with the ability to handle 
NULLs; see
 the [segment internals](../design/segments.md#handling-null-values) 
documentation for more details.
+
+## Boolean logic
+
+The 
[`druid.expressions.useStrictBooleans`](../configuration/index.md#expression-processing-configurations)
+runtime property controls Druid's boolean logic mode. For the most SQL 
compliant behavior, set this to `true`.
+
+When `druid.expressions.useStrictBooleans = false` (the default mode), Druid 
uses two-valued logic.
+
+When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued 
logic for
+[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual 
columns or `expression` filters.
+However, even in this mode, Druid uses two-valued logic for filter types other 
than `expression`.
diff --git 
a/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java 
b/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
index 60e07ee09b..2dec044cf1 100644
--- a/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
+++ b/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
@@ -482,7 +482,21 @@ public class InDimFilter extends 
AbstractOptimizableDimFilter implements Filter
     }
 
     final LongOpenHashSet longHashSet = new LongOpenHashSet(longs);
-    return longHashSet::contains;
+    final boolean matchNull = values.contains(null);
+    return new DruidLongPredicate()
+    {
+      @Override
+      public boolean applyLong(long n)
+      {
+        return longHashSet.contains(n);
+      }
+
+      @Override
+      public boolean applyNull()
+      {
+        return matchNull;
+      }
+    };
   }
 
   private static DruidFloatPredicate createFloatPredicate(final Set<String> 
values)
@@ -496,7 +510,21 @@ public class InDimFilter extends 
AbstractOptimizableDimFilter implements Filter
     }
 
     final IntOpenHashSet floatBitsHashSet = new IntOpenHashSet(floatBits);
-    return input -> floatBitsHashSet.contains(Float.floatToIntBits(input));
+    final boolean matchNull = values.contains(null);
+    return new DruidFloatPredicate()
+    {
+      @Override
+      public boolean applyFloat(float n)
+      {
+        return floatBitsHashSet.contains(Float.floatToIntBits(n));
+      }
+
+      @Override
+      public boolean applyNull()
+      {
+        return matchNull;
+      }
+    };
   }
 
   private static DruidDoublePredicate createDoublePredicate(final Set<String> 
values)
@@ -510,7 +538,21 @@ public class InDimFilter extends 
AbstractOptimizableDimFilter implements Filter
     }
 
     final LongOpenHashSet doubleBitsHashSet = new LongOpenHashSet(doubleBits);
-    return input -> doubleBitsHashSet.contains(Double.doubleToLongBits(input));
+    final boolean matchNull = values.contains(null);
+    return new DruidDoublePredicate()
+    {
+      @Override
+      public boolean applyDouble(double n)
+      {
+        return doubleBitsHashSet.contains(Double.doubleToLongBits(n));
+      }
+
+      @Override
+      public boolean applyNull()
+      {
+        return matchNull;
+      }
+    };
   }
 
   @VisibleForTesting
diff --git 
a/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java 
b/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
index 7981fdf3d5..23ced8e520 100644
--- a/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
+++ b/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
@@ -23,15 +23,10 @@ import com.google.common.base.Function;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
 import nl.jqno.equalsverifier.EqualsVerifier;
 import org.apache.druid.common.config.NullHandling;
 import org.apache.druid.data.input.InputRow;
-import org.apache.druid.data.input.impl.DimensionsSpec;
-import org.apache.druid.data.input.impl.InputRowParser;
-import org.apache.druid.data.input.impl.MapInputRowParser;
-import org.apache.druid.data.input.impl.TimeAndDimsParseSpec;
-import org.apache.druid.data.input.impl.TimestampSpec;
-import org.apache.druid.java.util.common.DateTimes;
 import org.apache.druid.java.util.common.IAE;
 import org.apache.druid.java.util.common.Pair;
 import org.apache.druid.js.JavaScriptConfig;
@@ -55,28 +50,20 @@ import org.junit.runners.Parameterized;
 
 import java.io.Closeable;
 import java.util.ArrayList;
+import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 
 @RunWith(Parameterized.class)
 public class InFilterTest extends BaseFilterTest
 {
-  private static final String TIMESTAMP_COLUMN = "timestamp";
-
-  private static final InputRowParser<Map<String, Object>> PARSER = new 
MapInputRowParser(
-      new TimeAndDimsParseSpec(
-          new TimestampSpec(TIMESTAMP_COLUMN, "iso", DateTimes.of("2000")),
-          DimensionsSpec.EMPTY
-      )
-  );
-
   private static final List<InputRow> ROWS = ImmutableList.of(
-      PARSER.parseBatch(ImmutableMap.of("dim0", "a", "dim1", "", "dim2", 
ImmutableList.of("a", "b"))).get(0),
-      PARSER.parseBatch(ImmutableMap.of("dim0", "b", "dim1", "10", "dim2", 
ImmutableList.of())).get(0),
-      PARSER.parseBatch(ImmutableMap.of("dim0", "c", "dim1", "2", "dim2", 
ImmutableList.of(""))).get(0),
-      PARSER.parseBatch(ImmutableMap.of("dim0", "d", "dim1", "1", "dim2", 
ImmutableList.of("a"))).get(0),
-      PARSER.parseBatch(ImmutableMap.of("dim0", "e", "dim1", "def", "dim2", 
ImmutableList.of("c"))).get(0),
-      PARSER.parseBatch(ImmutableMap.of("dim0", "f", "dim1", "abc")).get(0)
+      makeDefaultSchemaRow("a", "", ImmutableList.of("a", "b"), "2017-07-25", 
0.0, 0.0f, 0L),
+      makeDefaultSchemaRow("b", "10", ImmutableList.of(), "2017-07-25", 10.1, 
10.1f, 100L),
+      makeDefaultSchemaRow("c", "2", ImmutableList.of(""), "2017-05-25", null, 
5.5f, 40L),
+      makeDefaultSchemaRow("d", "1", ImmutableList.of("a"), "2020-01-25", 
120.0245, 110.0f, null),
+      makeDefaultSchemaRow("e", "def", ImmutableList.of("c"), null, 60.0, 
null, 9001L),
+      makeDefaultSchemaRow("f", "abc", null, "2020-01-25", 765.432, 123.45f, 
12345L)
   );
 
   public InFilterTest(
@@ -363,6 +350,40 @@ public class InFilterTest extends BaseFilterTest
 
   }
 
+  @Test
+  public void testNumericColumnNullsAndDefaults()
+  {
+    if (canTestNumericNullsAsDefaultValues) {
+      assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0"), null), 
ImmutableList.of("a", "e"));
+      assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0"), null), 
ImmutableList.of("a", "c"));
+      assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0"), null), 
ImmutableList.of("a", "d"));
+      assertFilterMatches(new InDimFilter("f0", Collections.singleton(null), 
null), ImmutableList.of());
+      assertFilterMatches(new InDimFilter("d0", Collections.singleton(null), 
null), ImmutableList.of());
+      assertFilterMatches(new InDimFilter("l0", Collections.singleton(null), 
null), ImmutableList.of());
+
+      assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0", "999"), 
null), ImmutableList.of("a", "e"));
+      assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0", "999"), 
null), ImmutableList.of("a", "c"));
+      assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0", "999"), 
null), ImmutableList.of("a", "d"));
+      assertFilterMatches(new InDimFilter("f0", Sets.newHashSet(null, "999"), 
null), ImmutableList.of());
+      assertFilterMatches(new InDimFilter("d0", Sets.newHashSet(null, "999"), 
null), ImmutableList.of());
+      assertFilterMatches(new InDimFilter("l0", Sets.newHashSet(null, "999"), 
null), ImmutableList.of());
+    } else {
+      assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0"), null), 
ImmutableList.of("a"));
+      assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0"), null), 
ImmutableList.of("a"));
+      assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0"), null), 
ImmutableList.of("a"));
+      assertFilterMatches(new InDimFilter("f0", Collections.singleton(null), 
null), ImmutableList.of("e"));
+      assertFilterMatches(new InDimFilter("d0", Collections.singleton(null), 
null), ImmutableList.of("c"));
+      assertFilterMatches(new InDimFilter("l0", Collections.singleton(null), 
null), ImmutableList.of("d"));
+
+      assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0", "999"), 
null), ImmutableList.of("a"));
+      assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0", "999"), 
null), ImmutableList.of("a"));
+      assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0", "999"), 
null), ImmutableList.of("a"));
+      assertFilterMatches(new InDimFilter("f0", Sets.newHashSet(null, "999"), 
null), ImmutableList.of("e"));
+      assertFilterMatches(new InDimFilter("d0", Sets.newHashSet(null, "999"), 
null), ImmutableList.of("c"));
+      assertFilterMatches(new InDimFilter("l0", Sets.newHashSet(null, "999"), 
null), ImmutableList.of("d"));
+    }
+  }
+
   @Test
   public void testRequiredColumnRewrite()
   {
diff --git 
a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java 
b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
index 666967c410..610590237a 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
@@ -5220,6 +5220,38 @@ public class CalciteQueryTest extends 
BaseCalciteQueryTest
     );
   }
 
+  @Test
+  public void testSqlIsNullToInFilter() throws Exception
+  {
+    testQuery(
+        "SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IS NULL OR dim1 = 
'abc' OR dim1 = 'def' OR dim1 = 'ghi' "
+        + "GROUP BY dim1",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setDimensions(dimensions(new 
DefaultDimensionSpec("dim1", "d0")))
+                        .setDimFilter(new InDimFilter("dim1", 
Arrays.asList("abc", "def", "ghi", null), null))
+                        .setAggregatorSpecs(
+                            aggregators(
+                                new CountAggregatorFactory("a0")
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        NullHandling.sqlCompatible() ? ImmutableList.of(
+            new Object[]{"abc", 1L},
+            new Object[]{"def", 1L}
+        ) : ImmutableList.of(
+            new Object[]{"", 1L},
+            new Object[]{"abc", 1L},
+            new Object[]{"def", 1L}
+        )
+    );
+  }
+
   @Test
   public void testInFilterWith23Elements() throws Exception
   {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to