[druid] branch master updated: enable sql compatible null handling mode by default (#14792)

cwylie Mon, 21 Aug 2023 20:07:56 -0700

This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 5d1412949e enable sql compatible null handling mode by default (#14792)
5d1412949e is described below

commit 5d1412949e4f8ff99c54021da82f34ec842891e2
Author: Clint Wylie <[email protected]>
AuthorDate: Mon Aug 21 20:07:13 2023 -0700

    enable sql compatible null handling mode by default (#14792)
    
    * enable sql compatible null handling mode by default
    * fix bug with string first/last aggs when 
druid.generic.useDefaultValueForNull=false
---
 docs/configuration/index.md                        |  2 +-
 docs/design/segments.md                            | 13 +++---
 docs/ingestion/schema-design.md                    |  2 -
 docs/querying/math-expr.md                         |  6 +--
 docs/querying/sql-aggregations.md                  | 50 +++++++++++-----------
 docs/querying/sql-array-functions.md               |  4 +-
 docs/querying/sql-data-types.md                    | 31 +++++++-------
 docs/querying/sql-functions.md                     |  4 +-
 docs/querying/sql-metadata-tables.md               |  2 +-
 docs/querying/sql-multivalue-string-functions.md   |  4 +-
 docs/querying/sql-query-context.md                 |  2 +-
 .../wikipedia_msq_select_query1.json               |  6 +--
 .../wikipedia_msq_select_query_ha.json             | 12 +++---
 ...wikipedia_msq_select_query_sequential_test.json |  2 +-
 .../testing/utils/AbstractTestQueryHelper.java     |  3 +-
 .../coordinator/duty/ITAutoCompactionTest.java     | 14 +++---
 .../queries/wikipedia_editstream_queries.json      |  2 +-
 .../common/config/NullValueHandlingConfig.java     |  2 +-
 .../aggregation/first/StringFirstAggregator.java   |  6 +--
 .../first/StringFirstBufferAggregator.java         |  6 +--
 .../aggregation/first/StringFirstLastUtils.java    |  3 ++
 .../aggregation/last/StringLastAggregator.java     |  6 +--
 .../last/StringLastBufferAggregator.java           |  6 +--
 23 files changed, 96 insertions(+), 92 deletions(-)

diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index a234659e9b..362e2d553b 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -798,7 +798,7 @@ Prior to version 0.13.0, Druid string columns treated `''` 
and `null` values as
 
 |Property|Description|Default|
 |---|---|---|
-|`druid.generic.useDefaultValueForNull`|When set to `true`, `null` values will 
be stored as `''` for string columns and `0` for numeric columns. Set to 
`false` to store and query data in SQL compatible mode.|`true`|
+|`druid.generic.useDefaultValueForNull`|Set to `false` to store and query data 
in SQL compatible mode. When set to `true` (legacy mode), `null` values will be 
stored as `''` for string columns and `0` for numeric columns.|`false`|
 |`druid.generic.ignoreNullsForStringCardinality`|When set to `true`, `null` 
values will be ignored for the built-in cardinality aggregator over string 
columns. Set to `false` to include `null` values while estimating cardinality 
of only string columns using the built-in cardinality aggregator. This setting 
takes effect only when `druid.generic.useDefaultValueForNull` is set to `true` 
and is ignored in SQL compatibility mode. Additionally, empty strings 
(equivalent to null) are not counte [...]
 This mode does have a storage size and query performance cost, see [segment 
documentation](../design/segments.md#handling-null-values) for more details.
 
diff --git a/docs/design/segments.md b/docs/design/segments.md
index 5dbc8ba97b..194520045a 100644
--- a/docs/design/segments.md
+++ b/docs/design/segments.md
@@ -82,13 +82,16 @@ For each row in the list of column data, there is only a 
single bitmap that has
 
 ## Handling null values
 
-By default, Druid string dimension columns use the values `''` and `null` 
interchangeably. Numeric and metric columns cannot represent `null` but use 
nulls to mean `0`. However, Druid provides a SQL compatible null handling mode, 
which you can enable at the system level through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, 
allows Druid to create segments _at ingestion time_ in which the following 
occurs:
-* String columns can distinguish `''` from `null`,
-* Numeric columns can represent `null` valued rows instead of `0`.
+By default Druid stores segments in a SQL compatible null handling mode. 
String columns always store the null value as id 0, the first position in the 
value dictionary and an associated entry in the bitmap value indexes used to 
filter null values. Numeric columns also store a null value bitmap index to 
indicate the null valued rows, which is used to null check aggregations and for 
filter matching null values. 
 
-String dimension columns contain no additional column structures in SQL 
compatible null handling mode. Instead, they reserve an additional dictionary 
entry for the `null` value. Numeric columns are stored in the segment with an 
additional bitmap in which the set bits indicate `null`-valued rows. 
+Druid also has a legacy mode which uses default values instead of nulls, which 
was the default prior to Druid 28.0.0. This legacy mode can be enabled by 
setting `druid.generic.useDefaultValueForNull=true`.
 
-In addition to slightly increased segment sizes, SQL compatible null handling 
can incur a performance cost at query time, due to the need to check the null 
bitmap. This performance cost only occurs for columns that actually contain 
null values.
+In legacy mode, Druid segments created _at ingestion time_ have the following 
characteristics:
+
+* String columns can not distinguish `''` from `null`, they are treated 
interchangeably as the same value
+* Numeric columns can not represent `null` valued rows, and instead store a 
`0`.
+
+In legacy mode, numeric columns do not have the null value bitmap, and so can 
have slightly decreased segment sizes, and queries involving numeric columns 
can have slightly increased performance in some cases since there is no need to 
check the null value bitmap.
 
 ## Segments with different schemas
 
diff --git a/docs/ingestion/schema-design.md b/docs/ingestion/schema-design.md
index 655d88a0e4..556cdc41a4 100644
--- a/docs/ingestion/schema-design.md
+++ b/docs/ingestion/schema-design.md
@@ -263,8 +263,6 @@ native boolean types, Druid ingests these values as strings 
if `druid.expression
 the [array functions](../querying/sql-array-functions.md) or 
[UNNEST](../querying/sql-functions.md#unnest). Nested
 columns can be queried with the [JSON 
functions](../querying/sql-json-functions.md).
 
-We also highly recommend setting `druid.generic.useDefaultValueForNull=false` 
when using these columns since it also enables out of the box `ARRAY` type 
filtering. If not set to `false`, setting `sqlUseBoundsAndSelectors` to `false` 
on the [SQL query context](../querying/sql-query-context.md) can enable `ARRAY` 
filtering instead.
-
 Mixed type columns are stored in the _least_ restrictive type that can 
represent all values in the column. For example:
 
 - Mixed numeric columns are `DOUBLE`
diff --git a/docs/querying/math-expr.md b/docs/querying/math-expr.md
index a204bc9bca..3da1fd3981 100644
--- a/docs/querying/math-expr.md
+++ b/docs/querying/math-expr.md
@@ -161,7 +161,7 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 |remainder|remainder(x, y) returns the remainder operation on two arguments as 
prescribed by the IEEE 754 standard|
 |rint|rint(x) returns value that is closest in value to x and is equal to a 
mathematical integer|
 |round|round(x, y) returns the value of the x rounded to the y decimal places. 
While x can be an integer or floating-point number, y must be an integer. The 
type of the return value is specified by that of x. y defaults to 0 if omitted. 
When y is negative, x is rounded on the left side of the y decimal points. If x 
is `NaN`, x returns 0. If x is infinity, x will be converted to the nearest 
finite double. |
-|safe_divide|safe_divide(x,y) returns the division of x by y if y is not equal 
to 0. In case y is 0 it returns 0 or `null` if 
`druid.generic.useDefaultValueForNull=false` |
+|safe_divide|safe_divide(x,y) returns the division of x by y if y is not equal 
to 0. In case y is 0 it returns `null` or 0 if 
`druid.generic.useDefaultValueForNull=true` (legacy mode) |
 |scalb|scalb(d, sf) returns d * 2^sf rounded as if performed by a single 
correctly rounded floating-point multiply to a member of the double value set|
 |signum|signum(x) returns the signum function of the argument x|
 |sin|sin(x) returns the trigonometric sine of an angle x|
@@ -183,8 +183,8 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 | array_ordinal(arr,long) | returns the array element at the 1 based index 
supplied, or null for an out of range index |
 | array_contains(arr,expr) | returns 1 if the array contains the element 
specified by expr, or contains all elements specified by expr if expr is an 
array, else 0 |
 | array_overlap(arr1,arr2) | returns 1 if arr1 and arr2 have any elements in 
common, else 0 |
-| array_offset_of(arr,expr) | returns the 0 based index of the first 
occurrence of expr in the array, or `-1` or `null` if 
`druid.generic.useDefaultValueForNull=false`if no matching elements exist in 
the array. |
-| array_ordinal_of(arr,expr) | returns the 1 based index of the first 
occurrence of expr in the array, or `-1` or `null` if 
`druid.generic.useDefaultValueForNull=false` if no matching elements exist in 
the array. |
+| array_offset_of(arr,expr) | returns the 0 based index of the first 
occurrence of expr in the array, or `null` or `-1` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode) if no matching 
elements exist in the array. |
+| array_ordinal_of(arr,expr) | returns the 1 based index of the first 
occurrence of expr in the array, or `null` or `-1` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode) if no matching 
elements exist in the array. |
 | array_prepend(expr,arr) | adds expr to arr at the beginning, the resulting 
array type determined by the type of the array |
 | array_append(arr,expr) | appends expr to arr, the resulting array type 
determined by the type of the first array |
 | array_concat(arr1,arr2) | concatenates 2 arrays, the resulting array type 
determined by the type of the first array |
diff --git a/docs/querying/sql-aggregations.md 
b/docs/querying/sql-aggregations.md
index 4cb30cd193..f9233d40f7 100644
--- a/docs/querying/sql-aggregations.md
+++ b/docs/querying/sql-aggregations.md
@@ -71,41 +71,41 @@ In the aggregation functions supported by Druid, only 
`COUNT`, `ARRAY_AGG`, and
 |--------|-----|-------|
 |`COUNT(*)`|Counts the number of rows.|`0`|
 |`COUNT(DISTINCT expr)`|Counts distinct values of `expr`.<br /><br />When 
`useApproximateCountDistinct` is set to "true" (the default), this is an alias 
for `APPROX_COUNT_DISTINCT`. The specific algorithm depends on the value of 
[`druid.sql.approxCountDistinct.function`](../configuration/index.md#sql). In 
this mode, you can use strings, numbers, or prebuilt sketches. If counting 
prebuilt sketches, the prebuilt sketch type must match the selected 
algorithm.<br /><br />When `useApproximate [...]
-|`SUM(expr)`|Sums numbers.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
-|`MIN(expr)`|Takes the minimum of numbers.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `9223372036854775807` 
(maximum LONG value)|
-|`MAX(expr)`|Takes the maximum of numbers.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `-9223372036854775808` 
(minimum LONG value)|
-|`AVG(expr)`|Averages numbers.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
+|`SUM(expr)`|Sums numbers.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`MIN(expr)`|Takes the minimum of numbers.|`null` or `9223372036854775807` 
(maximum LONG value) if `druid.generic.useDefaultValueForNull=true` (legacy 
mode)|
+|`MAX(expr)`|Takes the maximum of numbers.|`null` or `-9223372036854775808` 
(minimum LONG value) if `druid.generic.useDefaultValueForNull=true` (legacy 
mode)|
+|`AVG(expr)`|Averages numbers.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
 |`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of `expr` using an 
approximate algorithm. The `expr` can be a regular column or a prebuilt sketch 
column.<br /><br />The specific algorithm depends on the value of 
[`druid.sql.approxCountDistinct.function`](../configuration/index.md#sql). By 
default, this is `APPROX_COUNT_DISTINCT_BUILTIN`. If the [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) is loaded, 
you can set it to `APPROX_COUNT_DISTINCT_DS_H [...]
 |`APPROX_COUNT_DISTINCT_BUILTIN(expr)`|_Usage note:_ consider using 
`APPROX_COUNT_DISTINCT_DS_HLL` instead, which offers better accuracy in many 
cases.<br/><br/>Counts distinct values of `expr` using Druid's built-in 
"cardinality" or "hyperUnique" aggregators, which implement a variant of 
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). The 
`expr` can be a string, a number, or a prebuilt hyperUnique column. Results are 
always approximate, regardless of the value  [...]
 |`APPROX_QUANTILE(expr, probability, [resolution])`|_Deprecated._ Use 
`APPROX_QUANTILE_DS` instead, which provides a superior 
distribution-independent algorithm with formal error 
guarantees.<br/><br/>Computes approximate quantiles on numeric or 
[approxHistogram](../development/extensions-core/approximate-histograms.md#approximate-histogram-aggregator)
 expressions. `probability` should be between 0 and 1, exclusive. `resolution` 
is the number of centroids to use for the computation. Highe [...]
 |`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, 
upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric 
or [fixed buckets 
histogram](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram)
 expressions. `probability` should be between 0 and 1, exclusive. The 
`numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters 
are described in the fixed buckets histogram documentation. Load the 
[approximat [...]
 |`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced 
by `expr`, with `numEntries` maximum number of distinct values before false 
positive rate increases. See [bloom filter 
extension](../development/extensions-core/bloom-filter.md) documentation for 
additional details.|Empty base64 encoded bloom filter STRING|
-|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See 
[stats extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`EARLIEST(expr)`|Returns the earliest value of `expr`, which must be numeric. 
If `expr` comes from a relation with a timestamp column (like `__time` in a 
Druid datasource), the "earliest" is taken from the row with the overall 
earliest non-null value of the timestamp column. If the earliest non-null value 
of the timestamp column appears in multiple rows, the `expr` may be taken from 
any of those rows. If `expr` does not come from a relation with a timestamp, 
then it is simply the first  [...]
-|`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit are truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `''`|
-|`EARLIEST_BY(expr, timestampExpr)`|Returns the earliest value of `expr`, 
which must be numeric. The earliest value of `expr` is taken from the row with 
the overall earliest non-null value of `timestampExpr`. If the earliest 
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be 
taken from any of those rows.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
-|`EARLIEST_BY(expr, timestampExpr, maxBytesPerString)`| Like 
`EARLIEST_BY(expr, timestampExpr)`, but for strings. The `maxBytesPerString` 
parameter determines how much aggregation space to allocate per string. Strings 
longer than this limit are truncated. This parameter should be set as low as 
possible, since high values will lead to wasted memory.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `''`|
-|`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. The 
`expr` must come from a relation with a timestamp column (like `__time` in a 
Druid datasource) and the "latest" is taken from the row with the overall 
latest non-null value of the timestamp column. If the latest non-null value of 
the timestamp column appears in multiple rows, the `expr` may be taken from any 
of those rows. |`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `0`|
-|`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit are truncated. This parameter should 
be set as low as possible, since high values will lead to wasted memory.|`null` 
if `druid.generic.useDefaultValueForNull=false`, otherwise `''`|
-|`LATEST_BY(expr, timestampExpr)`|Returns the latest value of `expr`, which 
must be numeric. The latest value of `expr` is taken from the row with the 
overall latest non-null value of `timestampExpr`. If the overall latest 
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be 
taken from any of those rows.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
-|`LATEST_BY(expr, timestampExpr, maxBytesPerString)`|Like `LATEST_BY(expr, 
timestampExpr)`, but for strings. The `maxBytesPerString` parameter determines 
how much aggregation space to allocate per string. Strings longer than this 
limit are truncated. This parameter should be set as low as possible, since 
high values will lead to wasted memory.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `''`|
-|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be 
numeric. This aggregator can simplify and optimize the performance by returning 
the first encountered value (including null)|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
-|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit are truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|`null` if `druid.generic.useDefaultValueForNull=false`, 
otherwise `''`|
+|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See 
[stats extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`EARLIEST(expr)`|Returns the earliest value of `expr`, which must be numeric. 
If `expr` comes from a relation with a timestamp column (like `__time` in a 
Druid datasource), the "earliest" is taken from the row with the overall 
earliest non-null value of the timestamp column. If the earliest non-null value 
of the timestamp column appears in multiple rows, the `expr` may be taken from 
any of those rows. If `expr` does not come from a relation with a timestamp, 
then it is simply the first  [...]
+|`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit are truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|`null` or `''` if `druid.generic.useDefaultValueForNull=true` 
(legacy mode)|
+|`EARLIEST_BY(expr, timestampExpr)`|Returns the earliest value of `expr`, 
which must be numeric. The earliest value of `expr` is taken from the row with 
the overall earliest non-null value of `timestampExpr`. If the earliest 
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be 
taken from any of those rows.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`EARLIEST_BY(expr, timestampExpr, maxBytesPerString)`| Like 
`EARLIEST_BY(expr, timestampExpr)`, but for strings. The `maxBytesPerString` 
parameter determines how much aggregation space to allocate per string. Strings 
longer than this limit are truncated. This parameter should be set as low as 
possible, since high values will lead to wasted memory.|`null` or `''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. The 
`expr` must come from a relation with a timestamp column (like `__time` in a 
Druid datasource) and the "latest" is taken from the row with the overall 
latest non-null value of the timestamp column. If the latest non-null value of 
the timestamp column appears in multiple rows, the `expr` may be taken from any 
of those rows. |`null` or `0` if `druid.generic.useDefaultValueForNull=true` 
(legacy mode)|
+|`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit are truncated. This parameter should 
be set as low as possible, since high values will lead to wasted memory.|`null` 
or `''` if `druid.generic.useDefaultValueForNull=false` (legacy mode)|
+|`LATEST_BY(expr, timestampExpr)`|Returns the latest value of `expr`, which 
must be numeric. The latest value of `expr` is taken from the row with the 
overall latest non-null value of `timestampExpr`. If the overall latest 
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be 
taken from any of those rows.|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`LATEST_BY(expr, timestampExpr, maxBytesPerString)`|Like `LATEST_BY(expr, 
timestampExpr)`, but for strings. The `maxBytesPerString` parameter determines 
how much aggregation space to allocate per string. Strings longer than this 
limit are truncated. This parameter should be set as low as possible, since 
high values will lead to wasted memory.|`null` or `''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be 
numeric. This aggregator can simplify and optimize the performance by returning 
the first encountered value (including null)|`null` or `0` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit are truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|`null` or `''` if `druid.generic.useDefaultValueForNull=true` 
(legacy mode)|
 |`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy 
dimension is included in a row, when using `GROUPING SETS`. Refer to 
[additional documentation](aggregations.md#grouping-aggregator) on how to infer 
this number.|N/A|
 |`ARRAY_AGG(expr, [size])`|Collects all values of `expr` into an ARRAY, 
including null values, with `size` in bytes limit on aggregation size (default 
of 1024 bytes). If the aggregated array grows larger than the maximum size in 
bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression 
is not currently supported, and the ordering of results within the output array 
may vary depending on processing order.|`null`|
 |`ARRAY_AGG(DISTINCT expr, [size])`|Collects all distinct values of `expr` 
into an ARRAY, including null values, with `size` in bytes limit on aggregation 
size (default of 1024 bytes) per aggregate. If the aggregated array grows 
larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` 
within the `ARRAY_AGG` expression is not currently supported, and the ordering 
of results will be based on the default for the element type.|`null`|
 |`ARRAY_CONCAT_AGG(expr, [size])`|Concatenates all array `expr` into a single 
ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes).  
 Input `expr` _must_ be an array. Null `expr` will be ignored, but any null 
values within an `expr` _will_ be included in the resulting array. If the 
aggregated array grows larger than the maximum size in bytes, the query will 
fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expression is not 
currently supported, and the orderi [...]
 |`ARRAY_CONCAT_AGG(DISTINCT expr, [size])`|Concatenates all distinct values of 
all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation 
size (default of 1024 bytes) per aggregate. Input `expr` _must_ be an array. 
Null `expr` will be ignored, but any null values within an `expr` _will_ be 
included in the resulting array. If the aggregated array grows larger than the 
maximum size in bytes, the query will fail. Use of `ORDER BY` within the 
`ARRAY_CONCAT_AGG` expressio [...]
-|`STRING_AGG([DISTINCT] expr, [separator, [size]])`|Collects all values (or 
all distinct values) of `expr` into a single STRING, ignoring null values. Each 
value is joined by an optional `separator`, which must be a literal STRING. If 
the `separator` is not provided, strings are concatenated without a 
separator.<br /><br />An optional `size` in bytes can be supplied to limit 
aggregation size (default of 1024 bytes). If the aggregated string grows larger 
than the maximum size in bytes, th [...]
-|`LISTAGG([DISTINCT] expr, [separator, [size]])`|Synonym for 
`STRING_AGG`.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise 
`''`|
-|`BIT_AND(expr)`|Performs a bitwise AND operation on all input values.|`null` 
if `druid.generic.useDefaultValueForNull=false`, otherwise `0`|
-|`BIT_OR(expr)`|Performs a bitwise OR operation on all input values.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
-|`BIT_XOR(expr)`|Performs a bitwise XOR operation on all input values.|`null` 
if `druid.generic.useDefaultValueForNull=false`, otherwise `0`|
+|`STRING_AGG([DISTINCT] expr, [separator, [size]])`|Collects all values (or 
all distinct values) of `expr` into a single STRING, ignoring null values. Each 
value is joined by an optional `separator`, which must be a literal STRING. If 
the `separator` is not provided, strings are concatenated without a 
separator.<br /><br />An optional `size` in bytes can be supplied to limit 
aggregation size (default of 1024 bytes). If the aggregated string grows larger 
than the maximum size in bytes, th [...]
+|`LISTAGG([DISTINCT] expr, [separator, [size]])`|Synonym for 
`STRING_AGG`.|`null` or `''` if `druid.generic.useDefaultValueForNull=true` 
(legacy mode)|
+|`BIT_AND(expr)`|Performs a bitwise AND operation on all input values.|`null` 
or `0` if `druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`BIT_OR(expr)`|Performs a bitwise OR operation on all input values.|`null` or 
`0` if `druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`BIT_XOR(expr)`|Performs a bitwise XOR operation on all input values.|`null` 
or `0` if `druid.generic.useDefaultValueForNull=true` (legacy mode)|
 
 ## Sketch functions
 
diff --git a/docs/querying/sql-array-functions.md 
b/docs/querying/sql-array-functions.md
index 460a0868bb..b39c5d526b 100644
--- a/docs/querying/sql-array-functions.md
+++ b/docs/querying/sql-array-functions.md
@@ -54,8 +54,8 @@ The following table describes array functions. To learn more 
about array aggrega
 |`ARRAY_ORDINAL(arr, long)`|Returns the array element at the 1-based index 
supplied, or null for an out of range index.|
 |`ARRAY_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns 1 if `arr` 
contains `expr`. If `expr` is an array, returns 1 if `arr` contains all 
elements of `expr`. Otherwise returns 0.|
 |`ARRAY_OVERLAP(arr1, arr2)`|Returns 1 if `arr1` and `arr2` have any elements 
in common, else 0.|
-|`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first 
occurrence of `expr` in the array. If no matching elements exist in the array, 
returns `-1` or `null` if `druid.generic.useDefaultValueForNull=false`.|
-|`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first 
occurrence of `expr` in the array. If no matching elements exist in the array, 
returns `-1` or `null` if `druid.generic.useDefaultValueForNull=false`.|
+|`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first 
occurrence of `expr` in the array. If no matching elements exist in the array, 
returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (legacy 
mode).|
+|`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first 
occurrence of `expr` in the array. If no matching elements exist in the array, 
returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (legacy 
mode).|
 |`ARRAY_PREPEND(expr, arr)`|Prepends `expr` to `arr` at the beginning, the 
resulting array type determined by the type of `arr`.|
 |`ARRAY_APPEND(arr1, expr)`|Appends `expr` to `arr`, the resulting array type 
determined by the type of `arr1`.|
 |`ARRAY_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array 
type is determined by the type of `arr1`.|
diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md
index 36331e0144..6fb5cc0764 100644
--- a/docs/querying/sql-data-types.md
+++ b/docs/querying/sql-data-types.md
@@ -66,15 +66,14 @@ The following table describes how Druid maps SQL types onto 
native types when ru
 |ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and 
multi-value strings can be converted to arrays. See [Arrays](#arrays) for more 
information.|
 |OTHER|COMPLEX|none|May represent various Druid column types such as 
hyperUnique, approxHistogram, etc.|
 
-<sup>*</sup> Default value applies if `druid.generic.useDefaultValueForNull = 
true` (the default mode). Otherwise, the default value is `NULL` for all types.
+<sup>*</sup> The default value is `NULL` for all types, except in legacy mode 
(`druid.generic.useDefaultValueForNull = true`) which initialize a default 
value. 
 
 Casts between two SQL types with the same Druid runtime type have no effect 
other than the exceptions noted in the table.
 
 Casts between two SQL types that have different Druid runtime types generate a 
runtime cast in Druid.
 
-If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, 
Druid either substitutes a default
-value (when `druid.generic.useDefaultValueForNull = true`, the default mode), 
or substitutes [NULL](#null-values) (when
-`druid.generic.useDefaultValueForNull = false`). NULL values cast to 
non-nullable types are also substituted with a default value. For example, if 
`druid.generic.useDefaultValueForNull = true`, a null VARCHAR cast to BIGINT is 
converted to a zero.
+If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, 
Druid a substitutes [NULL](#null-values).
+When `druid.generic.useDefaultValueForNull = true` (legacy mode), Druid 
instead substitutes a default value, including when NULL values cast to 
non-nullable types. For example, if `druid.generic.useDefaultValueForNull = 
true`, a null VARCHAR cast to BIGINT is converted to a zero.
 
 ## Multi-value strings
 
@@ -135,33 +134,33 @@ VARCHAR. ARRAY typed results will be serialized into 
stringified JSON arrays if
 ## NULL values
 
 The 
[`druid.generic.useDefaultValueForNull`](../configuration/index.md#sql-compatible-null-handling)
-runtime property controls Druid's NULL handling mode. For the most SQL 
compliant behavior, set this to `false`.
+runtime property controls Druid's NULL handling mode. For the most SQL 
compliant behavior, set this to `false` (the default).
 
-When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid 
treats NULLs and empty strings
+When `druid.generic.useDefaultValueForNull = false` (the default), NULLs are 
treated more closely to the SQL standard. In this mode,
+numeric NULL is permitted, and NULLs and empty strings are no longer treated 
as interchangeable. This property
+affects both storage and querying, and must be set on all Druid service types 
to be available at both ingestion time
+and query time. There is some overhead associated with the ability to handle 
NULLs; see
+the [segment internals](../design/segments.md#handling-null-values) 
documentation for more details.
+
+When `druid.generic.useDefaultValueForNull = true` (legacy mode), Druid treats 
NULLs and empty strings
 interchangeably, rather than according to the SQL standard. In this mode Druid 
SQL only has partial support for NULLs.
 For example, the expressions `col IS NULL` and `col = ''` are equivalent, and 
both evaluate to true if `col`
 contains an empty string. Similarly, the expression `COALESCE(col1, col2)` 
returns `col2` if `col1` is an empty
 string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)` 
aggregator counts the number of rows
 where `expr` is neither null nor the empty string. Numeric columns in this 
mode are not nullable; any null or missing
-values are treated as zeroes.
-
-When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more 
closely to the SQL standard. In this mode,
-numeric NULL is permitted, and NULLs and empty strings are no longer treated 
as interchangeable. This property
-affects both storage and querying, and must be set on all Druid service types 
to be available at both ingestion time
-and query time. There is some overhead associated with the ability to handle 
NULLs; see
-the [segment internals](../design/segments.md#handling-null-values) 
documentation for more details.
+values are treated as zeroes. This was the default prior to Druid 28.0.0.
 
 ## Boolean logic
 
 The 
[`druid.expressions.useStrictBooleans`](../configuration/index.md#expression-processing-configurations)
 runtime property controls Druid's boolean logic mode. For the most SQL 
compliant behavior, set this to `true`.
 
-When `druid.expressions.useStrictBooleans = false` (the default mode), Druid 
uses two-valued logic.
-
-When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued 
logic for
+When `druid.expressions.useStrictBooleans = false` (the default mode),  Druid 
uses three-valued logic for
 [expressions](math-expr.md) evaluation, such as `expression` virtual columns 
or `expression` filters.
 However, even in this mode, Druid uses two-valued logic for filter types other 
than `expression`.
 
+When `druid.expressions.useStrictBooleans = true` (legacy mode), Druid uses 
two-valued logic.
+
 ## Nested columns
 
 Druid supports storing nested data structures in segments using the native 
`COMPLEX<json>` type. See [Nested columns](./nested-columns.md) for more 
information.
diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md
index 8821c642f4..f936610e16 100644
--- a/docs/querying/sql-functions.md
+++ b/docs/querying/sql-functions.md
@@ -185,7 +185,7 @@ Returns the array element at the 0-based index supplied, or 
null for an out of r
 
 **Function type:** [Array](./sql-array-functions.md)
 
-Returns the 0-based index of the first occurrence of `expr` in the array. If 
no matching elements exist in the array, returns `-1` or `null` if 
`druid.generic.useDefaultValueForNull=false`.
+Returns the 0-based index of the first occurrence of `expr` in the array. If 
no matching elements exist in the array, returns `null` or `-1` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)..
 
 ## ARRAY_ORDINAL
 
@@ -200,7 +200,7 @@ Returns the array element at the 1-based index supplied, or 
null for an out of r
 
 **Function type:** [Array](./sql-array-functions.md)
 
-Returns the 1-based index of the first occurrence of `expr` in the array. If 
no matching elements exist in the array, returns `-1` or `null` if 
`druid.generic.useDefaultValueForNull=false`.|
+Returns the 1-based index of the first occurrence of `expr` in the array. If 
no matching elements exist in the array, returns `null` or `-1` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)..|
 
 ## ARRAY_OVERLAP
 
diff --git a/docs/querying/sql-metadata-tables.md 
b/docs/querying/sql-metadata-tables.md
index 23700e60a8..8e9bce9fad 100644
--- a/docs/querying/sql-metadata-tables.md
+++ b/docs/querying/sql-metadata-tables.md
@@ -234,7 +234,7 @@ Servers table lists all discovered servers in the cluster.
 |tier|VARCHAR|Distribution tier see 
[druid.server.tier](../configuration/index.md#historical-general-configuration).
 Only valid for HISTORICAL type, for other types it's null|
 |current_size|BIGINT|Current size of segments in bytes on this server. Only 
valid for HISTORICAL type, for other types it's 0|
 |max_size|BIGINT|Max size in bytes this server recommends to assign to 
segments see 
[druid.server.maxSize](../configuration/index.md#historical-general-configuration).
 Only valid for HISTORICAL type, for other types it's 0|
-|is_leader|BIGINT|1 if the server is currently the 'leader' (for services 
which have the concept of leadership), otherwise 0 if the server is not the 
leader, or the default long value (0 or null depending on 
`druid.generic.useDefaultValueForNull`) if the server type does not have the 
concept of leadership|
+|is_leader|BIGINT|1 if the server is currently the 'leader' (for services 
which have the concept of leadership), otherwise 0 if the server is not the 
leader, or the default long value (null or zero depending on 
`druid.generic.useDefaultValueForNull`) if the server type does not have the 
concept of leadership|
 |start_time|STRING|Timestamp in ISO8601 format when the server was announced 
in the cluster|
 To retrieve information about all servers, use the query:
 
diff --git a/docs/querying/sql-multivalue-string-functions.md 
b/docs/querying/sql-multivalue-string-functions.md
index 86c22abd83..9688ca083f 100644
--- a/docs/querying/sql-multivalue-string-functions.md
+++ b/docs/querying/sql-multivalue-string-functions.md
@@ -55,8 +55,8 @@ All array references in the multi-value string function 
documentation can refer
 |`MV_ORDINAL(arr, long)`|Returns the array element at the 1-based index 
supplied, or null for an out of range index.|
 |`MV_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns 1 if `arr` 
contains `expr`. If `expr` is an array, returns 1 if `arr` contains all 
elements of `expr`. Otherwise returns 0.|
 |`MV_OVERLAP(arr1, arr2)`|Returns 1 if `arr1` and `arr2` have any elements in 
common, else 0.|
-|`MV_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence 
of `expr` in the array. If no matching elements exist in the array, returns 
`-1` or `null` if `druid.generic.useDefaultValueForNull=false`.|
-|`MV_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence 
of `expr` in the array. If no matching elements exist in the array, returns 
`-1` or `null` if `druid.generic.useDefaultValueForNull=false`.|
+|`MV_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence 
of `expr` in the array. If no matching elements exist in the array, returns 
`null` or -1 if `druid.generic.useDefaultValueForNull=true` (legacy mode).|
+|`MV_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence 
of `expr` in the array. If no matching elements exist in the array, returns 
`null` or `-1` if `druid.generic.useDefaultValueForNull=true` (legacy mode).|
 |`MV_PREPEND(expr, arr)`|Adds `expr` to `arr` at the beginning, the resulting 
array type determined by the type of the array.|
 |`MV_APPEND(arr1, expr)`|Appends `expr` to `arr`, the resulting array type 
determined by the type of the first array.|
 |`MV_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array 
type is determined by the type of `arr1`.|
diff --git a/docs/querying/sql-query-context.md 
b/docs/querying/sql-query-context.md
index f9438363ad..dc192db171 100644
--- a/docs/querying/sql-query-context.md
+++ b/docs/querying/sql-query-context.md
@@ -46,7 +46,7 @@ Configure Druid SQL query planning using the parameters in 
the table below.
 |`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to 
TimeBoundary queries wherever possible. TimeBoundary queries are very efficient 
for min-max calculation on `__time` column in a datasource 
|`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker 
(default: false)|
 |`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain 
plan as a JSON representation of equivalent native query(s), else it will 
return the original version of explain plan generated by Calcite.<br /><br 
/>This property is provided for backwards compatibility. It is not recommended 
to use this parameter unless you were depending on the older 
behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: 
true)|
 |`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and 
later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in 
query results, as documented. If true (default behavior in Druid 24.0.1 and 
earlier), sketches from these functions are finalized when they appear in query 
results.<br /><br />This property is provided for backwards compatibility with 
behavior in Druid 24.0.1 and earlier. It is not recommended to use this 
parameter unless you were depending [...]
-|`sqlUseBoundAndSelectors`|If false (default behavior if 
`druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the 
SQL planner will use [equality](./filters.md#equality-filter), 
[null](./filters.md#null-filter), and [range](./filters.md#range-filter) 
filters instead of [selector](./filters.md#selector-filter) and 
[bounds](./filters.md#bound-filter). This value must be set to `false` for 
correct behavior for filtering `ARRAY` typed values. | Defaults to same value 
as `d [...]
+|`sqlUseBoundAndSelectors`|If false (default behavior if 
`druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the 
SQL planner will use [equality](./filters.md#equality-filter), 
[null](./filters.md#null-filter), and [range](./filters.md#range-filter) 
filters instead of [selector](./filters.md#selector-filter) and 
[bounds](./filters.md#bound-filter). This value must be set to `false` for 
correct behavior for filtering `ARRAY` typed values. | Defaults to same value 
as `d [...]
 
 ## Setting the query context
 The query context parameters can be specified as a "context" object in the 
[JSON API](../api-reference/sql-api.md) or as a [JDBC connection properties 
object](../api-reference/sql-jdbc.md).
diff --git 
a/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query1.json
 
b/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query1.json
index 151fb54aaf..32c3d592d6 100644
--- 
a/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query1.json
+++ 
b/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query1.json
@@ -4,7 +4,7 @@
     "expectedResults": [
       {
         "__time": 1377910953000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 57,
         "delta": -143,
         "deleted": 200,
@@ -12,7 +12,7 @@
       },
       {
         "__time": 1377919965000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 459,
         "delta": 330,
         "deleted": 129,
@@ -20,7 +20,7 @@
       },
       {
         "__time": 1377933081000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 123,
         "delta": 111,
         "deleted": 12,
diff --git 
a/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_ha.json
 
b/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_ha.json
index 58c3825072..992eda01a2 100644
--- 
a/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_ha.json
+++ 
b/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_ha.json
@@ -4,7 +4,7 @@
     "expectedResults": [
       {
         "__time": 1377910953000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 57,
         "delta": -143,
         "deleted": 200,
@@ -12,7 +12,7 @@
       },
       {
         "__time": 1377910953000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 57,
         "delta": -143,
         "deleted": 200,
@@ -20,7 +20,7 @@
       },
       {
         "__time": 1377919965000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 459,
         "delta": 330,
         "deleted": 129,
@@ -28,7 +28,7 @@
       },
       {
         "__time": 1377919965000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 459,
         "delta": 330,
         "deleted": 129,
@@ -36,7 +36,7 @@
       },
       {
         "__time": 1377933081000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 123,
         "delta": 111,
         "deleted": 12,
@@ -44,7 +44,7 @@
       },
       {
         "__time": 1377933081000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 123,
         "delta": 111,
         "deleted": 12,
diff --git 
a/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_sequential_test.json
 
b/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_sequential_test.json
index c50ea09ad2..6987f8cdb8 100644
--- 
a/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_sequential_test.json
+++ 
b/integration-tests-ex/cases/src/test/resources/multi-stage-query/wikipedia_msq_select_query_sequential_test.json
@@ -4,7 +4,7 @@
     "expectedResults": [
       {
         "__time": 1377933081000,
-        "isRobot": "",
+        "isRobot": null,
         "added": 123,
         "delta": 111,
         "deleted": 12,
diff --git 
a/integration-tests/src/main/java/org/apache/druid/testing/utils/AbstractTestQueryHelper.java
 
b/integration-tests/src/main/java/org/apache/druid/testing/utils/AbstractTestQueryHelper.java
index 7f2773a898..f680ad909a 100644
--- 
a/integration-tests/src/main/java/org/apache/druid/testing/utils/AbstractTestQueryHelper.java
+++ 
b/integration-tests/src/main/java/org/apache/druid/testing/utils/AbstractTestQueryHelper.java
@@ -186,7 +186,8 @@ public abstract class 
AbstractTestQueryHelper<QueryResultType extends AbstractQu
     } else {
       Map<String, Object> map = (Map<String, Object>) 
results.get(0).get("result");
 
-      return (Integer) map.get("rows");
+      Integer rowCount = (Integer) map.get("rows");
+      return rowCount == null ? 0 : rowCount;
     }
   }
 }
diff --git 
a/integration-tests/src/test/java/org/apache/druid/tests/coordinator/duty/ITAutoCompactionTest.java
 
b/integration-tests/src/test/java/org/apache/druid/tests/coordinator/duty/ITAutoCompactionTest.java
index 3c40affa78..26df03e0d8 100644
--- 
a/integration-tests/src/test/java/org/apache/druid/tests/coordinator/duty/ITAutoCompactionTest.java
+++ 
b/integration-tests/src/test/java/org/apache/druid/tests/coordinator/duty/ITAutoCompactionTest.java
@@ -466,8 +466,8 @@ public class ITAutoCompactionTest extends 
AbstractIndexerTest
           fullDatasourceName,
           AutoCompactionSnapshot.AutoCompactionScheduleStatus.RUNNING,
           0,
-          13702,
-          13701,
+          14166,
+          14165,
           0,
           2,
           2,
@@ -484,7 +484,7 @@ public class ITAutoCompactionTest extends 
AbstractIndexerTest
           fullDatasourceName,
           AutoCompactionSnapshot.AutoCompactionScheduleStatus.RUNNING,
           0,
-          21566,
+          22262,
           0,
           0,
           3,
@@ -600,8 +600,8 @@ public class ITAutoCompactionTest extends 
AbstractIndexerTest
       getAndAssertCompactionStatus(
           fullDatasourceName,
           AutoCompactionSnapshot.AutoCompactionScheduleStatus.RUNNING,
-          13702,
-          13701,
+          14166,
+          14165,
           0,
           2,
           2,
@@ -609,7 +609,7 @@ public class ITAutoCompactionTest extends 
AbstractIndexerTest
           1,
           1,
           0);
-      
Assert.assertEquals(compactionResource.getCompactionProgress(fullDatasourceName).get("remainingSegmentSize"),
 "13702");
+      
Assert.assertEquals(compactionResource.getCompactionProgress(fullDatasourceName).get("remainingSegmentSize"),
 "14166");
       // Run compaction again to compact the remaining day
       // Remaining day compacted (1 new segment). Now both days compacted (2 
total)
       forceTriggerAutoCompaction(2);
@@ -620,7 +620,7 @@ public class ITAutoCompactionTest extends 
AbstractIndexerTest
           fullDatasourceName,
           AutoCompactionSnapshot.AutoCompactionScheduleStatus.RUNNING,
           0,
-          21566,
+          22262,
           0,
           0,
           3,
diff --git 
a/integration-tests/src/test/resources/queries/wikipedia_editstream_queries.json
 
b/integration-tests/src/test/resources/queries/wikipedia_editstream_queries.json
index 59a5c6ca70..0d0290d232 100644
--- 
a/integration-tests/src/test/resources/queries/wikipedia_editstream_queries.json
+++ 
b/integration-tests/src/test/resources/queries/wikipedia_editstream_queries.json
@@ -1410,7 +1410,7 @@
                         "minValue":"",
                         "maxValue":"mmx._unknown",
                         "errorMessage":null,
-                        "hasNulls":true
+                        "hasNulls":false
                     },
                     "language":{
                         "typeSignature": "STRING",
diff --git 
a/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java
 
b/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java
index fbdc852105..fdd13d6a57 100644
--- 
a/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java
+++ 
b/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java
@@ -45,7 +45,7 @@ public class NullValueHandlingConfig
   )
   {
     if (useDefaultValuesForNull == null) {
-      this.useDefaultValuesForNull = 
Boolean.valueOf(System.getProperty(NULL_HANDLING_CONFIG_STRING, "true"));
+      this.useDefaultValuesForNull = 
Boolean.valueOf(System.getProperty(NULL_HANDLING_CONFIG_STRING, "false"));
     } else {
       this.useDefaultValuesForNull = useDefaultValuesForNull;
     }
diff --git 
a/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregator.java
 
b/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregator.java
index 8a6654fbfd..0d05833378 100644
--- 
a/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregator.java
+++ 
b/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregator.java
@@ -56,9 +56,6 @@ public class StringFirstAggregator implements Aggregator
   @Override
   public void aggregate()
   {
-    if (timeSelector.isNull()) {
-      return;
-    }
     if (needsFoldCheck) {
       // Less efficient code path when folding is a possibility (we must read 
the value selector first just in case
       // it's a foldable object).
@@ -72,6 +69,9 @@ public class StringFirstAggregator implements Aggregator
         firstValue = StringUtils.fastLooseChop(inPair.rhs, maxStringBytes);
       }
     } else {
+      if (timeSelector.isNull()) {
+        return;
+      }
       final long time = timeSelector.getLong();
 
       if (time < firstTime) {
diff --git 
a/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstBufferAggregator.java
 
b/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstBufferAggregator.java
index fbf2a4156c..563455c9ee 100644
--- 
a/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstBufferAggregator.java
+++ 
b/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstBufferAggregator.java
@@ -63,9 +63,6 @@ public class StringFirstBufferAggregator implements 
BufferAggregator
   @Override
   public void aggregate(ByteBuffer buf, int position)
   {
-    if (timeSelector.isNull()) {
-      return;
-    }
     if (needsFoldCheck) {
       // Less efficient code path when folding is a possibility (we must read 
the value selector first just in case
       // it's a foldable object).
@@ -86,6 +83,9 @@ public class StringFirstBufferAggregator implements 
BufferAggregator
         }
       }
     } else {
+      if (timeSelector.isNull()) {
+        return;
+      }
       final long time = timeSelector.getLong();
       final long firstTime = buf.getLong(position);
 
diff --git 
a/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java
 
b/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java
index b61a78a7c9..ee04f2c698 100644
--- 
a/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java
+++ 
b/processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java
@@ -120,6 +120,9 @@ public class StringFirstLastUtils
       time = pair.lhs;
       string = pair.rhs;
     } else if (object != null) {
+      if (timeSelector.isNull()) {
+        return null;
+      }
       time = timeSelector.getLong();
       string = DimensionHandlerUtils.convertObjectToString(object);
     } else {
diff --git 
a/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastAggregator.java
 
b/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastAggregator.java
index a7c33c8ad2..f1dbab6093 100644
--- 
a/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastAggregator.java
+++ 
b/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastAggregator.java
@@ -57,9 +57,6 @@ public class StringLastAggregator implements Aggregator
   @Override
   public void aggregate()
   {
-    if (timeSelector.isNull()) {
-      return;
-    }
     if (needsFoldCheck) {
       // Less efficient code path when folding is a possibility (we must read 
the value selector first just in case
       // it's a foldable object).
@@ -73,6 +70,9 @@ public class StringLastAggregator implements Aggregator
         lastValue = StringUtils.fastLooseChop(inPair.rhs, maxStringBytes);
       }
     } else {
+      if (timeSelector.isNull()) {
+        return;
+      }
       final long time = timeSelector.getLong();
 
       if (time >= lastTime) {
diff --git 
a/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastBufferAggregator.java
 
b/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastBufferAggregator.java
index 8611ef7236..3f78745f5f 100644
--- 
a/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastBufferAggregator.java
+++ 
b/processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastBufferAggregator.java
@@ -64,9 +64,6 @@ public class StringLastBufferAggregator implements 
BufferAggregator
   @Override
   public void aggregate(ByteBuffer buf, int position)
   {
-    if (timeSelector.isNull()) {
-      return;
-    }
     if (needsFoldCheck) {
       // Less efficient code path when folding is a possibility (we must read 
the value selector first just in case
       // it's a foldable object).
@@ -87,6 +84,9 @@ public class StringLastBufferAggregator implements 
BufferAggregator
         }
       }
     } else {
+      if (timeSelector.isNull()) {
+        return;
+      }
       final long time = timeSelector.getLong();
       final long lastTime = buf.getLong(position);
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch master updated: enable sql compatible null handling mode by default (#14792)

Reply via email to