This is an automated email from the ASF dual-hosted git repository.
gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 01d555e47b Adjust "in" filter null behavior to match "selector".
(#12863)
01d555e47b is described below
commit 01d555e47bdd87419242a9327b16cfbe017bcfa4
Author: Gian Merlino <[email protected]>
AuthorDate: Mon Aug 8 09:08:36 2022 -0700
Adjust "in" filter null behavior to match "selector". (#12863)
* Adjust "in" filter null behavior to match "selector".
Now, both of them match numeric nulls if constructed with a "null" value.
This is consistent as far as native execution goes, but doesn't match
the behavior of SQL = and IN. So, to address that, this patch also
updates the docs to clarify that the native filters do match nulls.
This patch also updates the SQL docs to describe how Boolean logic is
handled in addition to how NULL values are handled.
Fixes #12856.
* Fix test.
---
docs/querying/filters.md | 15 ++++--
docs/querying/sql-data-types.md | 39 +++++++++-----
.../org/apache/druid/query/filter/InDimFilter.java | 48 +++++++++++++++--
.../apache/druid/segment/filter/InFilterTest.java | 63 ++++++++++++++--------
.../apache/druid/sql/calcite/CalciteQueryTest.java | 32 +++++++++++
5 files changed, 155 insertions(+), 42 deletions(-)
diff --git a/docs/querying/filters.md b/docs/querying/filters.md
index b73270becd..f243ebb411 100644
--- a/docs/querying/filters.md
+++ b/docs/querying/filters.md
@@ -43,7 +43,8 @@ The grammar for a SELECTOR filter is as follows:
"filter": { "type": "selector", "dimension": <dimension_string>, "value":
<dimension_value_string> }
```
-This is the equivalent of `WHERE <dimension_string> =
'<dimension_value_string>'`.
+This is the equivalent of `WHERE <dimension_string> =
'<dimension_value_string>'` or `WHERE <dimension_string> IS NULL`
+(if the `value` is `null`).
The selector filter supports the use of extraction functions, see [Filtering
with Extraction Functions](#filtering-with-extraction-functions) for details.
@@ -226,7 +227,7 @@ In filter can be used to express the following SQL query:
SELECT COUNT(*) AS 'Count' FROM `table` WHERE `outlaw` IN ('Good', 'Bad',
'Ugly')
```
-The grammar for a IN filter is as follows:
+The grammar for a "in" filter is as follows:
```json
{
@@ -236,12 +237,16 @@ The grammar for a IN filter is as follows:
}
```
-The IN filter supports the use of extraction functions, see [Filtering with
Extraction Functions](#filtering-with-extraction-functions) for details.
+The "in" filter supports the use of extraction functions, see [Filtering with
Extraction Functions](#filtering-with-extraction-functions) for details.
-If an empty `values` array is passed to the IN filter, it will simply return
an empty result.
-If the `dimension` is a multi-valued dimension, the IN filter will return true
if one of the dimension values is
+If an empty `values` array is passed to the "in" filter, it will simply return
an empty result.
+
+If the `dimension` is a multi-valued dimension, the "in" filter will return
true if one of the dimension values is
in the `values` array.
+If the `values` array contains `null`, the "in" filter matches null values.
This differs from the SQL IN filter, which
+does not match NULL values.
+
## Like filter
Like filters can be used for basic wildcard searches. They are equivalent to
the SQL LIKE operator. Special characters
diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md
index 90eb7bb167..693a6b6604 100644
--- a/docs/querying/sql-data-types.md
+++ b/docs/querying/sql-data-types.md
@@ -85,17 +85,30 @@ the `UNNEST` functionality available in some other SQL
dialects. Refer to the do
## NULL values
-The `druid.generic.useDefaultValueForNull` [runtime
property](../configuration/index.md#sql-compatible-null-handling)
-controls Druid's NULL handling mode.
-
-In the default value mode (`true`), Druid treats NULLs and empty strings
interchangeably, rather than according to the SQL
-standard. In this mode Druid SQL only has partial support for NULLs. For
example, the expressions `col IS NULL` and
-`col = ''` are equivalent, and both will evaluate to true if `col` contains an
empty string. Similarly, the expression
-`COALESCE(col1, col2)` will return `col2` if `col1` is an empty string. While
the `COUNT(*)` aggregator counts all rows,
-the `COUNT(expr)` aggregator will count the number of rows where `expr` is
neither null nor the empty string. Numeric
-columns in this mode are not nullable; any null or missing values will be
treated as zeroes.
-
-In SQL compatible mode (`false`), NULLs are treated more closely to the SQL
standard. The property affects both storage
-and querying, so for correct behavior, it should be set on all Druid service
types to be available at both ingestion
-time and query time. There is some overhead associated with the ability to
handle NULLs; see
+The
[`druid.generic.useDefaultValueForNull`](../configuration/index.md#sql-compatible-null-handling)
+runtime property controls Druid's NULL handling mode. For the most SQL
compliant behavior, set this to `false`.
+
+When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid
treats NULLs and empty strings
+interchangeably, rather than according to the SQL standard. In this mode Druid
SQL only has partial support for NULLs.
+For example, the expressions `col IS NULL` and `col = ''` are equivalent, and
both will evaluate to true if `col`
+contains an empty string. Similarly, the expression `COALESCE(col1, col2)`
will return `col2` if `col1` is an empty
+string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)`
aggregator will count the number of rows
+where `expr` is neither null nor the empty string. Numeric columns in this
mode are not nullable; any null or missing
+values will be treated as zeroes.
+
+When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more
closely to the SQL standard. In this mode,
+numeric NULL is permitted, and NULLs and empty strings are no longer treated
as interchangeable. This property
+affects both storage and querying, and must be set on all Druid service types
to be available at both ingestion time
+and query time. There is some overhead associated with the ability to handle
NULLs; see
the [segment internals](../design/segments.md#handling-null-values)
documentation for more details.
+
+## Boolean logic
+
+The
[`druid.expressions.useStrictBooleans`](../configuration/index.md#expression-processing-configurations)
+runtime property controls Druid's boolean logic mode. For the most SQL
compliant behavior, set this to `true`.
+
+When `druid.expressions.useStrictBooleans = false` (the default mode), Druid
uses two-valued logic.
+
+When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued
logic for
+[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual
columns or `expression` filters.
+However, even in this mode, Druid uses two-valued logic for filter types other
than `expression`.
diff --git
a/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
b/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
index 60e07ee09b..2dec044cf1 100644
--- a/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
+++ b/processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
@@ -482,7 +482,21 @@ public class InDimFilter extends
AbstractOptimizableDimFilter implements Filter
}
final LongOpenHashSet longHashSet = new LongOpenHashSet(longs);
- return longHashSet::contains;
+ final boolean matchNull = values.contains(null);
+ return new DruidLongPredicate()
+ {
+ @Override
+ public boolean applyLong(long n)
+ {
+ return longHashSet.contains(n);
+ }
+
+ @Override
+ public boolean applyNull()
+ {
+ return matchNull;
+ }
+ };
}
private static DruidFloatPredicate createFloatPredicate(final Set<String>
values)
@@ -496,7 +510,21 @@ public class InDimFilter extends
AbstractOptimizableDimFilter implements Filter
}
final IntOpenHashSet floatBitsHashSet = new IntOpenHashSet(floatBits);
- return input -> floatBitsHashSet.contains(Float.floatToIntBits(input));
+ final boolean matchNull = values.contains(null);
+ return new DruidFloatPredicate()
+ {
+ @Override
+ public boolean applyFloat(float n)
+ {
+ return floatBitsHashSet.contains(Float.floatToIntBits(n));
+ }
+
+ @Override
+ public boolean applyNull()
+ {
+ return matchNull;
+ }
+ };
}
private static DruidDoublePredicate createDoublePredicate(final Set<String>
values)
@@ -510,7 +538,21 @@ public class InDimFilter extends
AbstractOptimizableDimFilter implements Filter
}
final LongOpenHashSet doubleBitsHashSet = new LongOpenHashSet(doubleBits);
- return input -> doubleBitsHashSet.contains(Double.doubleToLongBits(input));
+ final boolean matchNull = values.contains(null);
+ return new DruidDoublePredicate()
+ {
+ @Override
+ public boolean applyDouble(double n)
+ {
+ return doubleBitsHashSet.contains(Double.doubleToLongBits(n));
+ }
+
+ @Override
+ public boolean applyNull()
+ {
+ return matchNull;
+ }
+ };
}
@VisibleForTesting
diff --git
a/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
b/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
index 7981fdf3d5..23ced8e520 100644
--- a/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
+++ b/processing/src/test/java/org/apache/druid/segment/filter/InFilterTest.java
@@ -23,15 +23,10 @@ import com.google.common.base.Function;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
import nl.jqno.equalsverifier.EqualsVerifier;
import org.apache.druid.common.config.NullHandling;
import org.apache.druid.data.input.InputRow;
-import org.apache.druid.data.input.impl.DimensionsSpec;
-import org.apache.druid.data.input.impl.InputRowParser;
-import org.apache.druid.data.input.impl.MapInputRowParser;
-import org.apache.druid.data.input.impl.TimeAndDimsParseSpec;
-import org.apache.druid.data.input.impl.TimestampSpec;
-import org.apache.druid.java.util.common.DateTimes;
import org.apache.druid.java.util.common.IAE;
import org.apache.druid.java.util.common.Pair;
import org.apache.druid.js.JavaScriptConfig;
@@ -55,28 +50,20 @@ import org.junit.runners.Parameterized;
import java.io.Closeable;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import java.util.Map;
@RunWith(Parameterized.class)
public class InFilterTest extends BaseFilterTest
{
- private static final String TIMESTAMP_COLUMN = "timestamp";
-
- private static final InputRowParser<Map<String, Object>> PARSER = new
MapInputRowParser(
- new TimeAndDimsParseSpec(
- new TimestampSpec(TIMESTAMP_COLUMN, "iso", DateTimes.of("2000")),
- DimensionsSpec.EMPTY
- )
- );
-
private static final List<InputRow> ROWS = ImmutableList.of(
- PARSER.parseBatch(ImmutableMap.of("dim0", "a", "dim1", "", "dim2",
ImmutableList.of("a", "b"))).get(0),
- PARSER.parseBatch(ImmutableMap.of("dim0", "b", "dim1", "10", "dim2",
ImmutableList.of())).get(0),
- PARSER.parseBatch(ImmutableMap.of("dim0", "c", "dim1", "2", "dim2",
ImmutableList.of(""))).get(0),
- PARSER.parseBatch(ImmutableMap.of("dim0", "d", "dim1", "1", "dim2",
ImmutableList.of("a"))).get(0),
- PARSER.parseBatch(ImmutableMap.of("dim0", "e", "dim1", "def", "dim2",
ImmutableList.of("c"))).get(0),
- PARSER.parseBatch(ImmutableMap.of("dim0", "f", "dim1", "abc")).get(0)
+ makeDefaultSchemaRow("a", "", ImmutableList.of("a", "b"), "2017-07-25",
0.0, 0.0f, 0L),
+ makeDefaultSchemaRow("b", "10", ImmutableList.of(), "2017-07-25", 10.1,
10.1f, 100L),
+ makeDefaultSchemaRow("c", "2", ImmutableList.of(""), "2017-05-25", null,
5.5f, 40L),
+ makeDefaultSchemaRow("d", "1", ImmutableList.of("a"), "2020-01-25",
120.0245, 110.0f, null),
+ makeDefaultSchemaRow("e", "def", ImmutableList.of("c"), null, 60.0,
null, 9001L),
+ makeDefaultSchemaRow("f", "abc", null, "2020-01-25", 765.432, 123.45f,
12345L)
);
public InFilterTest(
@@ -363,6 +350,40 @@ public class InFilterTest extends BaseFilterTest
}
+ @Test
+ public void testNumericColumnNullsAndDefaults()
+ {
+ if (canTestNumericNullsAsDefaultValues) {
+ assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0"), null),
ImmutableList.of("a", "e"));
+ assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0"), null),
ImmutableList.of("a", "c"));
+ assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0"), null),
ImmutableList.of("a", "d"));
+ assertFilterMatches(new InDimFilter("f0", Collections.singleton(null),
null), ImmutableList.of());
+ assertFilterMatches(new InDimFilter("d0", Collections.singleton(null),
null), ImmutableList.of());
+ assertFilterMatches(new InDimFilter("l0", Collections.singleton(null),
null), ImmutableList.of());
+
+ assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0", "999"),
null), ImmutableList.of("a", "e"));
+ assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0", "999"),
null), ImmutableList.of("a", "c"));
+ assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0", "999"),
null), ImmutableList.of("a", "d"));
+ assertFilterMatches(new InDimFilter("f0", Sets.newHashSet(null, "999"),
null), ImmutableList.of());
+ assertFilterMatches(new InDimFilter("d0", Sets.newHashSet(null, "999"),
null), ImmutableList.of());
+ assertFilterMatches(new InDimFilter("l0", Sets.newHashSet(null, "999"),
null), ImmutableList.of());
+ } else {
+ assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0"), null),
ImmutableList.of("a"));
+ assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0"), null),
ImmutableList.of("a"));
+ assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0"), null),
ImmutableList.of("a"));
+ assertFilterMatches(new InDimFilter("f0", Collections.singleton(null),
null), ImmutableList.of("e"));
+ assertFilterMatches(new InDimFilter("d0", Collections.singleton(null),
null), ImmutableList.of("c"));
+ assertFilterMatches(new InDimFilter("l0", Collections.singleton(null),
null), ImmutableList.of("d"));
+
+ assertFilterMatches(new InDimFilter("f0", Sets.newHashSet("0", "999"),
null), ImmutableList.of("a"));
+ assertFilterMatches(new InDimFilter("d0", Sets.newHashSet("0", "999"),
null), ImmutableList.of("a"));
+ assertFilterMatches(new InDimFilter("l0", Sets.newHashSet("0", "999"),
null), ImmutableList.of("a"));
+ assertFilterMatches(new InDimFilter("f0", Sets.newHashSet(null, "999"),
null), ImmutableList.of("e"));
+ assertFilterMatches(new InDimFilter("d0", Sets.newHashSet(null, "999"),
null), ImmutableList.of("c"));
+ assertFilterMatches(new InDimFilter("l0", Sets.newHashSet(null, "999"),
null), ImmutableList.of("d"));
+ }
+ }
+
@Test
public void testRequiredColumnRewrite()
{
diff --git
a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
index 666967c410..610590237a 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
@@ -5220,6 +5220,38 @@ public class CalciteQueryTest extends
BaseCalciteQueryTest
);
}
+ @Test
+ public void testSqlIsNullToInFilter() throws Exception
+ {
+ testQuery(
+ "SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IS NULL OR dim1 =
'abc' OR dim1 = 'def' OR dim1 = 'ghi' "
+ + "GROUP BY dim1",
+ ImmutableList.of(
+ GroupByQuery.builder()
+ .setDataSource(CalciteTests.DATASOURCE1)
+ .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setGranularity(Granularities.ALL)
+ .setDimensions(dimensions(new
DefaultDimensionSpec("dim1", "d0")))
+ .setDimFilter(new InDimFilter("dim1",
Arrays.asList("abc", "def", "ghi", null), null))
+ .setAggregatorSpecs(
+ aggregators(
+ new CountAggregatorFactory("a0")
+ )
+ )
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ NullHandling.sqlCompatible() ? ImmutableList.of(
+ new Object[]{"abc", 1L},
+ new Object[]{"def", 1L}
+ ) : ImmutableList.of(
+ new Object[]{"", 1L},
+ new Object[]{"abc", 1L},
+ new Object[]{"def", 1L}
+ )
+ );
+ }
+
@Test
public void testInFilterWith23Elements() throws Exception
{
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]