This is an automated email from the ASF dual-hosted git repository. cwylie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push: new e805c9a1063 update docs after legacy configs have been removed (#17634) e805c9a1063 is described below commit e805c9a1063bd9108ca820dd31e62f2a672a648a Author: Clint Wylie <cwy...@apache.org> AuthorDate: Thu Jan 16 17:15:20 2025 -0800 update docs after legacy configs have been removed (#17634) * update docs after legacy configs have been removed * better runtime.properties validation with new StartupInjectorBuilder.PropertiesValidator to fail fast if removed properties are set --- docs/api-reference/service-status-api.md | 1 - docs/configuration/index.md | 20 ------ docs/design/segments.md | 11 +-- .../extensions-contrib/moving-average-query.md | 2 +- .../extensions-core/approximate-histograms.md | 4 +- docs/development/extensions-core/stats.md | 12 ++-- docs/ingestion/schema-design.md | 3 +- docs/querying/aggregations.md | 14 ++-- docs/querying/filters.md | 6 +- docs/querying/lookups.md | 6 +- docs/querying/math-expr.md | 22 ++---- docs/querying/sql-aggregations.md | 40 +++++------ docs/querying/sql-array-functions.md | 4 +- docs/querying/sql-data-types.md | 46 +----------- docs/querying/sql-functions.md | 4 +- docs/querying/sql-metadata-tables.md | 2 +- docs/querying/sql-multivalue-string-functions.md | 4 +- docs/querying/sql-query-context.md | 2 +- docs/querying/sql-scalar.md | 4 +- docs/release-info/migr-ansi-sql-null.md | 40 ++++------- docs/release-info/upgrade-notes.md | 2 +- .../common/config/NullValueHandlingConfig.java | 19 ----- .../apache/druid/guice/StartupInjectorBuilder.java | 82 +++++++++++++++++++++- .../math/expr/ExpressionProcessingConfig.java | 13 ---- .../druid/guice/StartupInjectorBuilderTest.java | 59 ++++++++++++++++ 25 files changed, 216 insertions(+), 206 deletions(-) diff --git a/docs/api-reference/service-status-api.md b/docs/api-reference/service-status-api.md index efa5ebe01c9..6e801ae632e 100644 --- a/docs/api-reference/service-status-api.md +++ b/docs/api-reference/service-status-api.md @@ -400,7 +400,6 @@ Host: http://ROUTER_IP:ROUTER_PORT "druid.emitter": "noop", "sun.io.unicode.encoding": "UnicodeBig", "druid.storage.type": "local", - "druid.expressions.useStrictBooleans": "true", "java.class.version": "55.0", "socksNonProxyHosts": "local|*.local|169.254/16|*.169.254/16", "druid.server.hiddenProperties": "[\"druid.s3.accessKey\",\"druid.s3.secretKey\",\"druid.metadata.storage.connector.password\", \"password\", \"key\", \"token\", \"pwd\"]" diff --git a/docs/configuration/index.md b/docs/configuration/index.md index 26601c6db40..710453f3714 100644 --- a/docs/configuration/index.md +++ b/docs/configuration/index.md @@ -823,20 +823,6 @@ Support for 64-bit floating point columns was released in Druid 0.11.0, so if yo |--------|-----------|-------| |`druid.indexing.doubleStorage`|Set to "float" to use 32-bit double representation for double columns.|double| -### SQL compatible null handling - -These configurations are deprecated and will be removed in a future release at which point Druid will always have SQl compatible null handling. - -Prior to version 0.13.0, Druid string columns treated `''` and `null` values as interchangeable, and numeric columns were unable to represent `null` values, coercing `null` to `0`. Druid 0.13.0 introduced a mode which enabled SQL compatible null handling, allowing string columns to distinguish empty strings from nulls, and numeric columns to contain null rows. - -|Property|Description|Default| -|--------|-----------|-------| -|`druid.generic.useDefaultValueForNull`|Set to `false` to store and query data in SQL compatible mode. This configuration has been deprecated and will be removed in a future release, taking on the `false` behavior. When set to `true` (deprecated legacy mode), `null` values will be stored as `''` for string columns and `0` for numeric columns.|`false`| -|`druid.generic.useThreeValueLogicForNativeFilters`|Set to `true` to use SQL compatible three-value logic when processing native Druid filters when `druid.generic.useDefaultValueForNull=false` and `druid.expressions.useStrictBooleans=true`. This configuration has been deprecated and will be removed in a future release, taking on the `true` behavior. When set to `false` Druid uses 2 value logic for filter processing, even when `druid.generic.useDefaultValueForNull=false` and `druid.expres [...] -|`druid.generic.ignoreNullsForStringCardinality`|When set to `true`, `null` values will be ignored for the built-in cardinality aggregator over string columns. Set to `false` to include `null` values while estimating cardinality of only string columns using the built-in cardinality aggregator. This setting takes effect only when `druid.generic.useDefaultValueForNull` is set to `true` and is ignored in SQL compatibility mode. Additionally, empty strings (equivalent to null) are not counte [...] - -This mode does have a storage size and query performance cost, see [segment documentation](../design/segments.md#handling-null-values) for more details. - ### HTTP client All Druid components can communicate with each other over HTTP. @@ -2213,12 +2199,6 @@ Supported query contexts: |`sortByDimsFirst`|Sort the results first by dimension values and then by timestamp.|false| |`forceLimitPushDown`|When all fields in the orderby are part of the grouping key, the broker will push limit application down to the Historical processes. When the sorting order uses fields that are not in the grouping key, applying this optimization can result in approximate results with unknown accuracy, so this optimization is disabled by default in that case. Enabling this context flag turns on limit push down for limit/orderbys that contain non-grouping key columns.|false| -#### Expression processing configurations - -|Key|Description|Default| -|---|-----------|-------| -|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values are either `1` or `0`. This configuration has been deprecated and will be removed in a future release, taking on the `true` behavior. See [expression documentation](../querying/math-expr.md#logical-operator-modes) for more information.|true| - ### Router #### Router process configs diff --git a/docs/design/segments.md b/docs/design/segments.md index b6d3d16a3ae..6d2d9b5badb 100644 --- a/docs/design/segments.md +++ b/docs/design/segments.md @@ -82,16 +82,7 @@ For each row in the list of column data, there is only a single bitmap that has ## Handling null values -By default Druid stores segments in a SQL compatible null handling mode. String columns always store the null value as id 0, the first position in the value dictionary and an associated entry in the bitmap value indexes used to filter null values. Numeric columns also store a null value bitmap index to indicate the null valued rows, which is used to null check aggregations and for filter matching null values. - -Druid also has a legacy mode which uses default values instead of nulls, which was the default prior to Druid 28.0.0. This legacy mode is deprecated and will be removed in a future release, but can be enabled by setting `druid.generic.useDefaultValueForNull=true`. - -In legacy mode, Druid segments created _at ingestion time_ have the following characteristics: - -* String columns can not distinguish `''` from `null`, they are treated interchangeably as the same value -* Numeric columns can not represent `null` valued rows, and instead store a `0`. - -In legacy mode, numeric columns do not have the null value bitmap, and so can have slightly decreased segment sizes, and queries involving numeric columns can have slightly increased performance in some cases since there is no need to check the null value bitmap. +String columns always store the null value if present in any row as id 0, the first position in the value dictionary and an associated entry in the bitmap value indexes used to filter null values. Numeric columns also store a null value bitmap index to indicate the null valued rows, which is used to null check aggregations and for filter matching null values. ## Segments with different schemas diff --git a/docs/development/extensions-contrib/moving-average-query.md b/docs/development/extensions-contrib/moving-average-query.md index 54bf2f32588..0fcc9c4f5bc 100644 --- a/docs/development/extensions-contrib/moving-average-query.md +++ b/docs/development/extensions-contrib/moving-average-query.md @@ -73,7 +73,7 @@ There are currently no configuration properties specific to Moving Average. ## Limitations * movingAverage is missing support for the following groupBy properties: `subtotalsSpec`, `virtualColumns`. * movingAverage is missing support for the following timeseries properties: `descending`. -* movingAverage is missing support for [SQL-compatible null handling](https://github.com/apache/druid/issues/4349) (So setting druid.generic.useDefaultValueForNull in configuration will give an error). +* movingAverage averagers consider empty buckets and null aggregation values as 0 unless otherwise noted. ## Query spec * Most properties in the query spec derived from [groupBy query](../../querying/groupbyquery.md) / [timeseries](../../querying/timeseriesquery.md), see documentation for these query types. diff --git a/docs/development/extensions-core/approximate-histograms.md b/docs/development/extensions-core/approximate-histograms.md index ee6455e47d5..b92190361df 100644 --- a/docs/development/extensions-core/approximate-histograms.md +++ b/docs/development/extensions-core/approximate-histograms.md @@ -226,9 +226,7 @@ For performance and accuracy reasons, we recommend avoiding aggregation of histo ### Null handling -If `druid.generic.useDefaultValueForNull` is false, null values will be tracked in the `missingValueCount` field of the histogram. - -If `druid.generic.useDefaultValueForNull` is true, null values will be added to the histogram as the default 0.0 value. +Druid tracks null values in the `missingValueCount` field of the histogram. ## Histogram post-aggregators diff --git a/docs/development/extensions-core/stats.md b/docs/development/extensions-core/stats.md index e0df34a004e..280c3d5b6d8 100644 --- a/docs/development/extensions-core/stats.md +++ b/docs/development/extensions-core/stats.md @@ -62,12 +62,12 @@ You can use the variance and standard deviation aggregation functions in the SEL |Function|Notes|Default| |--------|-----|-------| -|`VAR_POP(expr)`|Computes variance population of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`VAR_SAMP(expr)`|Computes variance sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`VARIANCE(expr)`|Computes variance sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`STDDEV(expr)`|Computes standard deviation sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| +|`VAR_POP(expr)`|Computes variance population of `expr`.|`null`| +|`VAR_SAMP(expr)`|Computes variance sample of `expr`.|`null`| +|`VARIANCE(expr)`|Computes variance sample of `expr`.|`null`| +|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`.|`null`| +|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`.|`null`| +|`STDDEV(expr)`|Computes standard deviation sample of `expr`.|`null`| ### Pre-aggregating variance at ingestion time diff --git a/docs/ingestion/schema-design.md b/docs/ingestion/schema-design.md index eeb082d95ec..05c71151ca5 100644 --- a/docs/ingestion/schema-design.md +++ b/docs/ingestion/schema-design.md @@ -258,8 +258,7 @@ You can have Druid infer the schema and types for your data partially or fully b When performing type-aware schema discovery, Druid can discover all the columns of your input data (that are not present in the exclusion list). Druid automatically chooses the most appropriate native Druid type among `STRING`, `LONG`, `DOUBLE`, `ARRAY<STRING>`, `ARRAY<LONG>`, `ARRAY<DOUBLE>`, or `COMPLEX<json>` for nested data. For input formats with -native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true` -(the default) or strings if set to `false`. Array typed columns can be queried using +native boolean types, Druid ingests these values as longs. Array typed columns can be queried using the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested columns can be queried with the [JSON functions](../querying/sql-json-functions.md). diff --git a/docs/querying/aggregations.md b/docs/querying/aggregations.md index 8ef8287a982..c7b7d4e4efc 100644 --- a/docs/querying/aggregations.md +++ b/docs/querying/aggregations.md @@ -103,7 +103,7 @@ Example: #### `doubleMin` aggregator -`doubleMin` computes the minimum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Double.POSITIVE_INFINITY if true. +`doubleMin` computes the minimum of all input values and null. Example: ```json @@ -112,7 +112,7 @@ Example: #### `doubleMax` aggregator -`doubleMax` computes the maximum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Double.NEGATIVE_INFINITY if true. +`doubleMax` computes the maximum of all input values and null. Example: ```json @@ -121,7 +121,7 @@ Example: #### `floatMin` aggregator -`floatMin` computes the minimum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Float.POSITIVE_INFINITY if true. +`floatMin` computes the minimum of all input values and null. Example: ```json @@ -130,7 +130,7 @@ Example: #### `floatMax` aggregator -`floatMax` computes the maximum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Float.NEGATIVE_INFINITY if true. +`floatMax` computes the maximum of all input values and null. Example: ```json @@ -139,7 +139,7 @@ Example: #### `longMin` aggregator -`longMin` computes the minimum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Long.MAX_VALUE if true. +`longMin` computes the minimum of all input values and null. Example: ```json @@ -148,7 +148,7 @@ Example: #### `longMax` aggregator -`longMax` computes the maximum of all metric values and null if `druid.generic.useDefaultValueForNull` is false or Long.MIN_VALUE if true. +`longMax` computes the maximum of all metric values and null. Example: ```json @@ -485,7 +485,7 @@ Aggregator applicable only at query time. Aggregates results using [Druid expres | `finalize` | The finalize expression which can only refer to a single input variable, `o`. This expression is used to perform any final transformation of the output of the `fold` or `combine` expressions. If not set, then the value is not transformed. | No | | `initialValue` | The initial value of the accumulator for the `fold` (and `combine`, if `InitialCombineValue` is null) expression. | Yes | | `initialCombineValue` | The initial value of the accumulator for the `combine` expression. | No. Default `initialValue`. | -| `isNullUnlessAggregated` | Indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | No. Defaults to the value of `druid.generic.useDefaultValueForNull`. | +| `isNullUnlessAggregated` | If true, sets the default output value to `null` when the aggregator does not process any rows. If false, Druid computes the value as the result of running the expressions with initial values. | No. Defaults to `true`. | | `shouldAggregateNullInputs` | Indicates if the `fold` expression should operate on any `null` input values. | No. Defaults to `true`. | | `shouldCombineAggregateNullInputs` | Indicates if the `combine` expression should operate on any `null` input values. | No. Defaults to the value of `shouldAggregateNullInputs`. | | `maxSizeBytes` | Maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | No. Default is 8192 bytes. | diff --git a/docs/querying/filters.md b/docs/querying/filters.md index ae84ecbfbaf..3e594313979 100644 --- a/docs/querying/filters.md +++ b/docs/querying/filters.md @@ -68,7 +68,7 @@ When the selector filter matches against numeric inputs, the string `value` will The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter is designed to have more SQL compatible behavior than the selector filter and so can not match null values. To match null values use the null filter. -Druid's SQL planner uses the equality filter by default instead of selector filter whenever `druid.generic.useDefaultValueForNull=false`, or if `sqlUseBoundAndSelectors` is set to false on the [SQL query context](./sql-query-context.md). +Druid's SQL planner uses the equality filter by default instead of selector filter whenever unless `sqlUseBoundAndSelectors` is set to true on the [SQL query context](./sql-query-context.md). | Property | Description | Required | | -------- | ----------- | -------- | @@ -99,7 +99,7 @@ Druid's SQL planner uses the equality filter by default instead of selector filt The null filter is a partial replacement for the selector filter. It is dedicated to matching NULL values. -Druid's SQL planner uses the null filter by default instead of selector filter whenever `druid.generic.useDefaultValueForNull=false`, or if `sqlUseBoundAndSelectors` is set to false on the [SQL query context](./sql-query-context.md). +Druid's SQL planner uses the null filter by default instead of selector filter unless `sqlUseBoundAndSelectors` is set to true on the [SQL query context](./sql-query-context.md). | Property | Description | Required | | -------- | ----------- | -------- | @@ -310,7 +310,7 @@ Note that the bound filter matches null values if you don't specify a lower boun The range filter is a replacement for the bound filter. It compares against any type of column and is designed to have has more SQL compliant behavior than the bound filter. It won't match null values, even if you don't specify a lower bound. -Druid's SQL planner uses the range filter by default instead of bound filter whenever `druid.generic.useDefaultValueForNull=false`, or if `sqlUseBoundAndSelectors` is set to false on the [SQL query context](./sql-query-context.md). +Druid's SQL planner uses the range filter by default instead of bound filter unless `sqlUseBoundAndSelectors` is set to true on the [SQL query context](./sql-query-context.md). | Property | Description | Required | | -------- | ----------- | -------- | diff --git a/docs/querying/lookups.md b/docs/querying/lookups.md index 05176f229b1..2b5af42aa4a 100644 --- a/docs/querying/lookups.md +++ b/docs/querying/lookups.md @@ -210,11 +210,7 @@ Injective lookups are eligible for the largest set of query rewrites. Injective - The lookup table must have a key-value pair defined for every input that the `LOOKUP` function call may encounter. For example, when calling `LOOKUP(sku, 'sku_to_name')`, the `sku_to_name` lookup table must have a key for all possible `sku`. -- In SQL-compatible null handling mode (when `druid.generic.useDefaultValueForNull = false`, the default) injective - lookup tables are not required to have keys for `null`, since `LOOKUP` of `null` is always `null` itself. -- When `druid.generic.useDefaultValueForNull = true`, a `LOOKUP` of `null` retrieves the value mapped to the - empty-string key (`""`). In this mode, injective lookup tables must have an empty-string key if the `LOOKUP` - function may encounter null input values. +- Injective lookup tables are not required to have keys for `null`, since `LOOKUP` of `null` is always `null` itself. To determine whether a lookup is injective, Druid relies on an `injective` property that you can set in the [lookup definition](./lookups-cached-global.md). In general, you should set diff --git a/docs/querying/math-expr.md b/docs/querying/math-expr.md index 84494acfaab..ba0e3cf24a5 100644 --- a/docs/querying/math-expr.md +++ b/docs/querying/math-expr.md @@ -162,7 +162,7 @@ See javadoc of java.lang.Math for detailed explanation for each function. |remainder|remainder(x, y) returns the remainder operation on two arguments as prescribed by the IEEE 754 standard| |rint|rint(x) returns value that is closest in value to x and is equal to a mathematical integer| |round|round(x, y) returns the value of the x rounded to the y decimal places. While x can be an integer or floating-point number, y must be an integer. The type of the return value is specified by that of x. y defaults to 0 if omitted. When y is negative, x is rounded on the left side of the y decimal points. If x is `NaN`, x returns 0. If x is infinity, x will be converted to the nearest finite double. | -|safe_divide|safe_divide(x,y) returns the division of x by y if y is not equal to 0. In case y is 0 it returns `null` or 0 if `druid.generic.useDefaultValueForNull=true` (legacy mode) | +|safe_divide|safe_divide(x,y) returns the division of x by y if y is not equal to 0. Returns `null` if y is 0.| |scalb|scalb(d, sf) returns d * 2^sf rounded as if performed by a single correctly rounded floating-point multiply to a member of the double value set| |signum|signum(x) returns the signum function of the argument x| |sin|sin(x) returns the trigonometric sine of an angle x| @@ -185,8 +185,8 @@ See javadoc of java.lang.Math for detailed explanation for each function. | array_contains(arr,expr) | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0 | | array_overlap(arr1,arr2) | returns 1 if arr1 and arr2 have any elements in common, else 0 | | scalar_in_array(expr, arr) | returns 1 if the scalar is present in the array, else 0 if the expr is non-null, or null if the expr is null | -| array_offset_of(arr,expr) | returns the 0 based index of the first occurrence of expr in the array, or `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode) if no matching elements exist in the array. | -| array_ordinal_of(arr,expr) | returns the 1 based index of the first occurrence of expr in the array, or `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode) if no matching elements exist in the array. | +| array_offset_of(arr,expr) | returns the 0 based index of the first occurrence of expr in the array, or `null` if no matching elements exist in the array. | +| array_ordinal_of(arr,expr) | returns the 1 based index of the first occurrence of expr in the array, or `null` if no matching elements exist in the array. | | array_prepend(expr,arr) | adds expr to arr at the beginning, the resulting array type determined by the type of the array | | array_append(arr,expr) | appends expr to arr, the resulting array type determined by the type of the first array | | array_concat(arr1,arr2) | concatenates 2 arrays, the resulting array type determined by the type of the first array | @@ -315,8 +315,8 @@ Supported features: * string functions: the concatenation operator (`+`) and `concat` function are supported for string and numeric types * other: `parse_long` is supported for numeric and string types -## Logical operator modes -In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default. Logical operations treat `null` values as "unknown" for SQL compatible behavior. _All boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. +## Logical operator behavior +Logical operations treat `null` values as "unknown" for SQL compatible behavior. _All boolean output functions_ output `LONG` typed boolean values of `1` for `true` and `0` for `false`. For the "or" operator: * `true || null`, `null || true`, -> `1` @@ -326,22 +326,14 @@ For the "and" operator: * `true && null`, `null && true`, `null && null` -> `null` * `false && null`, `null && false` -> `0` -Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values in both modes: +Druid allows implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values: * `LONG` or `DOUBLE`: any value greater than 0 is considered `true`, else `false`. * `STRING`: the value `'true'` (case insensitive) is considered `true`, everything else is `false`. -SQL compatible behavior: +Behavior examples: * `100 && 11` -> `1` * `0.7 || 0.3` -> `1` * `100 && 0` -> `0` * `'troo' && 'true'` -> `0` * `'troo' || 'true'` -> `1` -Prior to Druid 28.0.0, `druid.expressions.useStrictBooleans=false` was the default. In this mode, boolean function expressions have inconsistent handling of true and false values. The logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values, similar to many scripting languages, and treat `null` as false, whic [...] - -Legacy behavior: -* `100 && 11` -> `11` -* `0.7 || 0.3` -> `0.3` -* `100 && 0` -> `0` -* `'troo' && 'true'` -> `'troo'` -* `'troo' || 'true'` -> `'true'` diff --git a/docs/querying/sql-aggregations.md b/docs/querying/sql-aggregations.md index 3f6e7466649..4d252d82167 100644 --- a/docs/querying/sql-aggregations.md +++ b/docs/querying/sql-aggregations.md @@ -71,36 +71,36 @@ In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and |--------|-----|-------| |`COUNT(*)`|Counts the number of rows.|`0`| |`COUNT([DISTINCT] expr)`|Counts the values of `expr`.<br /><br />By default, using DISTINCT serves as an alias for `APPROX_COUNT_DISTINCT` (`useApproximateCountDistinct=true`). The specific algorithm depends on the value of [`druid.sql.approxCountDistinct.function`](../configuration/index.md#sql). In this mode, you can use strings, numbers, or prebuilt sketches. If counting prebuilt sketches, the prebuilt sketch type must match the selected algorithm.<br /><br />When `useApproximateCoun [...] -|`SUM(expr)`|Sums numbers.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`MIN(expr)`|Takes the minimum of numbers.|`null` or `9223372036854775807` (maximum LONG value) if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`MAX(expr)`|Takes the maximum of numbers.|`null` or `-9223372036854775808` (minimum LONG value) if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`AVG(expr)`|Averages numbers.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| +|`SUM(expr)`|Sums numbers.|`null`| +|`MIN(expr)`|Takes the minimum of numbers.|`null`| +|`MAX(expr)`|Takes the maximum of numbers.|`null`| +|`AVG(expr)`|Averages numbers.|`null`| |`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of `expr` using an approximate algorithm. The `expr` can be a regular column or a prebuilt sketch column.<br /><br />The specific algorithm depends on the value of [`druid.sql.approxCountDistinct.function`](../configuration/index.md#sql). By default, this is `APPROX_COUNT_DISTINCT_BUILTIN`. If the [DataSketches extension](../development/extensions-core/datasketches-extension.md) is loaded, you can set it to `APPROX_COUNT_DISTINCT_DS_H [...] |`APPROX_COUNT_DISTINCT_BUILTIN(expr)`|_Usage note:_ consider using `APPROX_COUNT_DISTINCT_DS_HLL` instead, which offers better accuracy in many cases.<br/><br/>Counts distinct values of `expr` using Druid's built-in "cardinality" or "hyperUnique" aggregators, which implement a variant of [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). The `expr` can be a string, a number, or a prebuilt hyperUnique column. Results are always approximate, regardless of the value [...] |`APPROX_QUANTILE(expr, probability, [resolution])`|_Deprecated._ Use `APPROX_QUANTILE_DS` instead, which provides a superior distribution-independent algorithm with formal error guarantees.<br/><br/>Computes approximate quantiles on numeric or [approxHistogram](../development/extensions-core/approximate-histograms.md#approximate-histogram-aggregator) expressions. `probability` should be between 0 and 1, exclusive. `resolution` is the number of centroids to use for the computation. Highe [...] |`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric or [fixed buckets histogram](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram) expressions. `probability` should be between 0 and 1, exclusive. The `numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters are described in the fixed buckets histogram documentation. Load the [approximat [...] |`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced by `expr`, with `numEntries` maximum number of distinct values before false positive rate increases. See [bloom filter extension](../development/extensions-core/bloom-filter.md) documentation for additional details.|Empty base64 encoded bloom filter STRING| -|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`EARLIEST(expr, [maxBytesPerValue])`|Returns the earliest value of `expr`.<br />If `expr` comes from a relation with a timestamp column (like `__time` in a Druid datasource), the "earliest" is taken from the row with the overall earliest non-null value of the timestamp column.<br />If the earliest non-null value of the timestamp column appears in multiple rows, the `expr` may be taken from any of those rows. If `expr` does not come from a relation with a timestamp, then it is simply the [...] -|`EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the earliest value of `expr`.<br />The earliest value of `expr` is taken from the row with the overall earliest non-null value of `timestampExpr`. <br />If the earliest non-null value of `timestampExpr` appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit ar [...] -|`LATEST(expr, [maxBytesPerValue])`|Returns the latest value of `expr`<br />The `expr` must come from a relation with a timestamp column (like `__time` in a Druid datasource) and the "latest" is taken from the row with the overall latest non-null value of the timestamp column.<br />If the latest non-null value of the timestamp column appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of spac [...] -|`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the latest value of `expr`.<br />The latest value of `expr` is taken from the row with the overall latest non-null value of `timestampExpr`.<br />If the overall latest non-null value of `timestampExpr` appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are t [...] -|`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns any value of `expr` including null. This aggregator can simplify and optimize the performance by returning the first encountered value (including `null`).<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are truncated. The `maxBytesPerValue` parameter should be set as low as possible, since high values will lead to w [...] +|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null`| +|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null`| +|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null`| +|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null`| +|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null`| +|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|`null`| +|`EARLIEST(expr, [maxBytesPerValue])`|Returns the earliest value of `expr`.<br />If `expr` comes from a relation with a timestamp column (like `__time` in a Druid datasource), the "earliest" is taken from the row with the overall earliest non-null value of the timestamp column.<br />If the earliest non-null value of the timestamp column appears in multiple rows, the `expr` may be taken from any of those rows. If `expr` does not come from a relation with a timestamp, then it is simply the [...] +|`EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the earliest value of `expr`.<br />The earliest value of `expr` is taken from the row with the overall earliest non-null value of `timestampExpr`. <br />If the earliest non-null value of `timestampExpr` appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit ar [...] +|`LATEST(expr, [maxBytesPerValue])`|Returns the latest value of `expr`<br />The `expr` must come from a relation with a timestamp column (like `__time` in a Druid datasource) and the "latest" is taken from the row with the overall latest non-null value of the timestamp column.<br />If the latest non-null value of the timestamp column appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of spac [...] +|`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the latest value of `expr`.<br />The latest value of `expr` is taken from the row with the overall latest non-null value of `timestampExpr`.<br />If the overall latest non-null value of `timestampExpr` appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are t [...] +|`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns any value of `expr` including null. This aggregator can simplify and optimize the performance by returning the first encountered value (including `null`).<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are truncated. The `maxBytesPerValue` parameter should be set as low as possible, since high values will lead to w [...] |`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy dimension is included in a row, when using `GROUPING SETS`. Refer to [additional documentation](aggregations.md#grouping-aggregator) on how to infer this number.|N/A| |`ARRAY_AGG(expr, [size])`|Collects all values of `expr` into an ARRAY, including null values, with `size` in bytes limit on aggregation size (default of 1024 bytes). If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression is not currently supported, and the ordering of results within the output array may vary depending on processing order.|`null`| |`ARRAY_AGG(DISTINCT expr, [size])`|Collects all distinct values of `expr` into an ARRAY, including null values, with `size` in bytes limit on aggregation size (default of 1024 bytes) per aggregate. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression is not currently supported, and the ordering of results will be based on the default for the element type.|`null`| |`ARRAY_CONCAT_AGG(expr, [size])`|Concatenates all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes). Input `expr` _must_ be an array. Null `expr` will be ignored, but any null values within an `expr` _will_ be included in the resulting array. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expression is not currently supported, and the orderi [...] |`ARRAY_CONCAT_AGG(DISTINCT expr, [size])`|Concatenates all distinct values of all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes) per aggregate. Input `expr` _must_ be an array. Null `expr` will be ignored, but any null values within an `expr` _will_ be included in the resulting array. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expressio [...] -|`STRING_AGG([DISTINCT] expr, [separator, [size]])`|Collects all values (or all distinct values) of `expr` into a single STRING, ignoring null values. Each value is joined by an optional `separator`, which must be a literal STRING. If the `separator` is not provided, strings are concatenated without a separator.<br /><br />An optional `size` in bytes can be supplied to limit aggregation size (default of 1024 bytes). If the aggregated string grows larger than the maximum size in bytes, th [...] -|`LISTAGG([DISTINCT] expr, [separator, [size]])`|Synonym for `STRING_AGG`.|`null` or `''` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`BIT_AND(expr)`|Performs a bitwise AND operation on all input values.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`BIT_OR(expr)`|Performs a bitwise OR operation on all input values.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| -|`BIT_XOR(expr)`|Performs a bitwise XOR operation on all input values.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)| +|`STRING_AGG([DISTINCT] expr, [separator, [size]])`|Collects all values (or all distinct values) of `expr` into a single STRING, ignoring null values. Each value is joined by an optional `separator`, which must be a literal STRING. If the `separator` is not provided, strings are concatenated without a separator.<br /><br />An optional `size` in bytes can be supplied to limit aggregation size (default of 1024 bytes). If the aggregated string grows larger than the maximum size in bytes, th [...] +|`LISTAGG([DISTINCT] expr, [separator, [size]])`|Synonym for `STRING_AGG`.|`null`| +|`BIT_AND(expr)`|Performs a bitwise AND operation on all input values.|`null`| +|`BIT_OR(expr)`|Performs a bitwise OR operation on all input values.|`null`| +|`BIT_XOR(expr)`|Performs a bitwise XOR operation on all input values.|`null`| ## Sketch functions diff --git a/docs/querying/sql-array-functions.md b/docs/querying/sql-array-functions.md index 7b0f2112b6f..eaa7ebf50d7 100644 --- a/docs/querying/sql-array-functions.md +++ b/docs/querying/sql-array-functions.md @@ -55,8 +55,8 @@ The following table describes array functions. To learn more about array aggrega |`ARRAY_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns true if `arr` contains `expr`. If `expr` is an array, returns true if `arr` contains all elements of `expr`. Otherwise returns false.| |`ARRAY_OVERLAP(arr1, arr2)`|Returns true if `arr1` and `arr2` have any elements in common, else false.| |`SCALAR_IN_ARRAY(expr, arr)`|Returns true if the scalar `expr` is present in `arr`. Otherwise, returns false if the scalar `expr` is non-null or `UNKNOWN` if the scalar `expr` is `NULL`.| -|`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode).| -|`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode).| +|`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| +|`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| |`ARRAY_PREPEND(expr, arr)`|Adds `expr` to the beginning of `arr`, the resulting array type determined by the type of `arr`.| |`ARRAY_APPEND(arr, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of `arr`.| |`ARRAY_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md index 092e6823250..08492fcf76d 100644 --- a/docs/querying/sql-data-types.md +++ b/docs/querying/sql-data-types.md @@ -67,7 +67,7 @@ The following table describes how Druid maps SQL types onto native types when ru |OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.| <sup>*</sup> -The default value is <code>NULL</code> for all types, except in the deprecated legacy mode (<code>druid.generic.useDefaultValueForNull = true</code>) which initialize a default value. +The default value is <code>NULL</code> for all types. <br /><br /> For casts between two SQL types, the behavior depends on the runtime type: @@ -76,7 +76,6 @@ For casts between two SQL types, the behavior depends on the runtime type: * Casts between two SQL types that have different Druid runtime types generate a runtime cast in Druid. If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, Druid a substitutes [NULL](#null-values). -When `druid.generic.useDefaultValueForNull = true` (deprecated legacy mode), Druid instead substitutes a default value, including when NULL values cast to non-nullable types. For example, if `druid.generic.useDefaultValueForNull = true`, a null VARCHAR cast to BIGINT is converted to a zero. ## Arrays @@ -141,53 +140,12 @@ as regular JSON arrays instead of in stringified form. ## NULL values By default, Druid treats NULL values similarly to the ANSI SQL standard. -In the default mode: -- numeric NULL is permitted. -- NULL values and empty strings are not equal. -This manner of null handling applies to both storage and queries. -The [`druid.generic.useDefaultValueForNull`](../configuration/index.md#sql-compatible-null-handling) -runtime property controls Druid's NULL handling mode. For the most SQL compliant behavior, maintain the default value of `false`. - -There is some performance impact for null handling. see [segment internals](../design/segments.md#handling-null-values) for more information. For examples of null handling, see the [null handling tutorial](../tutorials/tutorial-sql-null.md). -### Legacy null handling mode - -:::info -To ensure Druid always behaves in an ANSI SQL compatible manner, this mode will be removed in a future release. -::: - -You can set `druid.generic.useDefaultValueForNull = true` to revert to Druid's deprecated legacy null handling mode, the default for Druid 27.0.0 and prior releases. This mode is not recommended. - -When running in the deprecated legacy mode, Druid treats NULL values and empty strings interchangeably. -In this mode: -- Druid does not distinguish between empty strings and nulls. -- Druid SQL only has partial support for NULLs. -- Numeric columns are not nullable; null or missing values are treated as 0. - -For example, the following expressions are equivalent: - -- col IS NULL -- col = '' - -Both evaluate to true if col contains an empty string. -Similarly, the expression COALESCE(`col1`, `col2`) returns `col2` if `col1` is an empty string. - -The COUNT(*) aggregator counts all rows but the COUNT(expr) aggregator counts the number of rows where expr is neither null nor the empty string. - ## Boolean logic -By default, Druid uses [SQL three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic#SQL) for filter processing -and boolean expression evaluation. This behavior relies on three settings: - -* [`druid.generic.useDefaultValueForNull`](../configuration/index.md#sql-compatible-null-handling) must be set to false (default), a runtime property which allows NULL values to exist in numeric columns and expressions, and string typed columns to distinguish between NULL and the empty string -* [`druid.expressions.useStrictBooleans`](../configuration/index.md#expression-processing-configurations) must be set to true (default), a runtime property controls Druid's boolean logic mode for expressions, as well as coercing all expression boolean values to be represented with a 1 for true and 0 for false -* [`druid.generic.useThreeValueLogicForNativeFilters`](../configuration/index.md#sql-compatible-null-handling) must be set to true (default), a runtime property which decouples three-value logic handling from `druid.generic.useDefaultValueForNull` and `druid.expressions.useStrictBooleans` for backwards compatibility with older versions of Druid that did not fully support SQL compatible null value logic handling - -If any of these settings is configured with a non-default value, Druid will use two-valued logic for non-expression based filters. Expression based filters are controlled independently with `druid.expressions.useStrictBooleans`, which if set to false Druid will use two-valued logic for expressions. - -These configurations have been deprecated and will be removed in a future release so that Druid always has SQL compliant behavior. +Druid uses [SQL three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic#SQL) for filter processing and boolean expression evaluation. ## Nested columns diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 3f45baab90b..666e06d548d 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -181,7 +181,7 @@ Returns the array element at the 0-based index supplied, or null for an out of r **Function type:** [Array](./sql-array-functions.md) -Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode). +Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`. ## ARRAY_ORDINAL @@ -196,7 +196,7 @@ Returns the array element at the 1-based index supplied, or null for an out of r **Function type:** [Array](./sql-array-functions.md) -Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode). +Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`. ## ARRAY_OVERLAP diff --git a/docs/querying/sql-metadata-tables.md b/docs/querying/sql-metadata-tables.md index b04b3f11e65..1192df5b035 100644 --- a/docs/querying/sql-metadata-tables.md +++ b/docs/querying/sql-metadata-tables.md @@ -234,7 +234,7 @@ Servers table lists all discovered servers in the cluster. |tier|VARCHAR|Distribution tier see [druid.server.tier](../configuration/index.md#historical-general-configuration). Only valid for HISTORICAL type, for other types it's null| |current_size|BIGINT|Current size of segments in bytes on this server. Only valid for HISTORICAL type, for other types it's 0| |max_size|BIGINT|Max size in bytes this server recommends to assign to segments see [druid.server.maxSize](../configuration/index.md#historical-general-configuration). Only valid for HISTORICAL type, for other types it's 0| -|is_leader|BIGINT|1 if the server is currently the 'leader' (for services which have the concept of leadership), otherwise 0 if the server is not the leader, or the default long value (null or zero depending on `druid.generic.useDefaultValueForNull`) if the server type does not have the concept of leadership| +|is_leader|BIGINT|1 if the server is currently the 'leader' (for services which have the concept of leadership), otherwise 0 if the server is not the leader, or null if the server type does not have the concept of leadership| |start_time|STRING|Timestamp in ISO8601 format when the server was announced in the cluster| To retrieve information about all servers, use the query: diff --git a/docs/querying/sql-multivalue-string-functions.md b/docs/querying/sql-multivalue-string-functions.md index c2eaadc0976..4851a1ab3f1 100644 --- a/docs/querying/sql-multivalue-string-functions.md +++ b/docs/querying/sql-multivalue-string-functions.md @@ -55,8 +55,8 @@ All array references in the multi-value string function documentation can refer |`MV_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.| |`MV_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns 1 if `arr` contains `expr`. If `expr` is an array, returns 1 if `arr` contains all elements of `expr`. Otherwise returns 0.| |`MV_OVERLAP(arr1, arr2)`|Returns 1 if `arr1` and `arr2` have any elements in common, else 0.| -|`MV_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null` or -1 if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode).| -|`MV_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode).| +|`MV_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| +|`MV_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| |`MV_PREPEND(expr, arr)`|Adds `expr` to the beginning of `arr`, the resulting array type determined by the type `arr`.| |`MV_APPEND(arr, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of `arr`.| |`MV_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| diff --git a/docs/querying/sql-query-context.md b/docs/querying/sql-query-context.md index 775bd4ca48f..219538251e2 100644 --- a/docs/querying/sql-query-context.md +++ b/docs/querying/sql-query-context.md @@ -48,7 +48,7 @@ For more information, see [Overriding default query context values](../configura |`enableTimeBoundaryPlanning`|If `true`, Druid converts SQL queries to [time boundary queries](timeboundaryquery.md) wherever possible. Time boundary queries are very efficient for min-max calculation on the `__time` column in a datasource. |`false`| |`useNativeQueryExplain`|If `true`, `EXPLAIN PLAN FOR` returns the explain plan as a JSON representation of equivalent native query, else it returns the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. We don't recommend setting this parameter unless your application depends on the older behavior.|`true`| |`sqlFinalizeOuterSketches`|If `false` (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results. If `true` (default behavior in Druid 24.0.1 and earlier), Druid finalizes sketches from these functions when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. We don't recommend setting this parameter unless your application uses Druid 2 [...] -|`sqlUseBoundAndSelectors`|If `false` (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner uses [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). For filtering `ARRAY` typed values, `sqlUseBoundAndSelectors` must be `false`. | Defaults to same value as `druid.generic.u [...] +|`sqlUseBoundAndSelectors`|If `false` (default behavior in Druid 27.0.0 and later), the SQL planner uses [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). For filtering `ARRAY` typed values, `sqlUseBoundAndSelectors` must be `false`. | `false`.| |`sqlReverseLookup`|Whether to consider the [reverse-lookup rewrite](lookups.md#reverse-lookup) of the `LOOKUP` function during SQL planning.<br /><br />Druid reverses calls to `LOOKUP` only when the number of matching keys is lower than both `inSubQueryThreshold` and `sqlReverseLookupThreshold`.|`true`| |`sqlReverseLookupThreshold`|Maximum size of `IN` filter to create when applying a [reverse-lookup rewrite](lookups.md#reverse-lookup). If a `LOOKUP` call matches more keys than the specified threshold, it remains unchanged.<br /><br />If `inSubQueryThreshold` is lower than `sqlReverseLookupThreshold`, Druid uses `inSubQueryThreshold` threshold instead.|10000| |`sqlPullUpLookup`|Whether to consider the [pull-up rewrite](lookups.md#pull-up) of the `LOOKUP` function during SQL planning.|`true`| diff --git a/docs/querying/sql-scalar.md b/docs/querying/sql-scalar.md index 2c08fc63669..fe91aa0055d 100644 --- a/docs/querying/sql-scalar.md +++ b/docs/querying/sql-scalar.md @@ -82,7 +82,7 @@ to FLOAT. At runtime, Druid will widen 32-bit floats to 64-bit for most expressi |`HUMAN_READABLE_BINARY_BYTE_FORMAT(value[, precision])`| Formats a number in human-readable [IEC](https://en.wikipedia.org/wiki/Binary_prefix) format. For example, `HUMAN_READABLE_BINARY_BYTE_FORMAT(1048576)` returns `1.00 MiB`. `precision` must be in the range of `[0, 3]`. If not specified, `precision` defaults to 2. | |`HUMAN_READABLE_DECIMAL_BYTE_FORMAT(value[, precision])`| Formats a number in human-readable [SI](https://en.wikipedia.org/wiki/Binary_prefix) format. For example, `HUMAN_READABLE_DECIMAL_BYTE_FORMAT(1048576)` returns `1.04 MB`. `precision` must be in the range of `[0, 3]`. If not specified, `precision` defaults to 2. | |`HUMAN_READABLE_DECIMAL_FORMAT(value[, precision])`| Formats a number in human-readable [SI](https://en.wikipedia.org/wiki/Binary_prefix) format. For example, `HUMAN_READABLE_DECIMAL_FORMAT(1048576)` returns `1.04 M`. `precision` must be in the range of `[0, 3]`. If not specified, `precision` defaults to 2. | -|`SAFE_DIVIDE(x, y)`|Returns the division of x by y guarded on division by 0. In case y is 0 it returns 0, or `null` if `druid.generic.useDefaultValueForNull=false` | +|`SAFE_DIVIDE(x, y)`|Returns the division of x by y guarded on division by 0. In case y is 0 it returns `null`| ## String functions @@ -109,7 +109,7 @@ String functions accept strings and return a type appropriate to the function. |`RPAD(expr, length[, chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, right pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.| |`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.| |`POSITION(substring IN expr [FROM startingIndex])`|Returns the index of `substring` within `expr` with indexes starting from 1. The search begins at `startingIndex`. If `startingIndex` is not specified, the default is 1. If `substring` is not found, returns 0.| -|`REGEXP_EXTRACT(expr, pattern[, index])`|Apply regular expression `pattern` to `expr` and extract a capture group or `NULL` if there is no match. If `index` is unspecified or zero, returns the first substring that matches the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern. When `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a [...] +|`REGEXP_EXTRACT(expr, pattern[, index])`|Apply regular expression `pattern` to `expr` and extract a capture group or `NULL` if there is no match. If `index` is unspecified or zero, returns the first substring that matches the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern.| |`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Similar to [`LIKE`](sql-operators.md#logical-operators), but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses.| |`REGEXP_REPLACE(expr, pattern, replacement)`|Replaces all occurrences of regular expression `pattern` within `expr` with `replacement`. The replacement string may refer to capture groups using `$1`, `$2`, etc. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern.| |`REPLACE(expr, substring, replacement)`|Replaces instances of `substring` in `expr` with `replacement` and returns the result.| diff --git a/docs/release-info/migr-ansi-sql-null.md b/docs/release-info/migr-ansi-sql-null.md index 5e5cd4ecef0..1655867ca15 100644 --- a/docs/release-info/migr-ansi-sql-null.md +++ b/docs/release-info/migr-ansi-sql-null.md @@ -25,44 +25,34 @@ sidebar_label: SQL compliant mode import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -In Apache Druid 28.0.0, the default [null handling](../querying/sql-data-types.md#null-values) mode changed to be compliant with the ANSI SQL standard. -This guide provides strategies for Druid operators who rely on legacy Druid null handling behavior in their applications to transition to SQL compliant mode. -Legacy mode is planned for removal from Druid. +In Apache Druid 32.0.0, legacy configurations which were incompatible with the ANSI SQL standard were removed. + +These configurations were: +* `druid.generic.useDefaultValueForNull` +* `druid.expressions.useStrictBooleans` +* `druid.generic.useThreeValueLogicForNativeFilters` + +This guide provides strategies for Druid operators who rely on legacy Druid null handling behavior in their applications to transition to Druid 32.0.0 or later. ## SQL compliant null handling -As of Druid 28.0.0, Druid writes segments in an ANSI SQL compatible null handling mode by default. -This means that Druid stores null values distinctly from empty strings for string dimensions and distinctly from 0 for numeric dimensions. +As of Druid 28.0.0, Druid writes segments in an ANSI SQL compatible null handling mode by default, and in Druid 32.0.0 this is no longer configurable. +This is a change of legacy behavior and means that Druid stores null values distinctly from empty strings for string dimensions and distinctly from 0 for numeric dimensions. This can impact your application behavior because the ANSI SQL standard defines any comparison to null to be unknown. According to this three-valued logic, `x <> 'some value'` only returns non-null values. -The default Druid configurations for 28.0.0 and later that enable ANSI SQL compatible null handling mode are the following: - -* `druid.generic.useDefaultValueForNull=false` -* `druid.expressions.useStrictBooleans=true` -* `druid.generic.useThreeValueLogicForNativeFilters=true` - -Follow the [Null handling tutorial](../tutorials/tutorial-sql-null.md) to learn how the default null handling works in Druid. +Follow the [Null handling tutorial](../tutorials/tutorial-sql-null.md) to learn how null handling works in Druid. -## Legacy null handling and two-valued logic +## Legacy null handling and two-valued filter logic Prior to Druid 28.0.0, Druid defaulted to a legacy mode which stored default values instead of nulls. -In legacy mode, Druid created segments with the following characteristics at ingestion time: +In this mode, Druid created segments with the following characteristics at ingestion time: - String columns couldn't distinguish an empty string, `''`, from null. Therefore, Druid treated them both as interchangeable values. - Numeric columns couldn't represent null valued rows. - Therefore, Druid stored `0` instead of `null`. - -The Druid configurations for the deprecated legacy mode were the following: - -* `druid.generic.useDefaultValueForNull=true` -* `druid.expressions.useStrictBooleans=false` -* `druid.generic.useThreeValueLogicForNativeFilters=true` - -These configurations are deprecated and scheduled for removal. -After the configurations are removed, Druid will ignore them if they exist in your configuration files and use the default SQL compliant mode. + Therefore, Druid stored `0` instead of `null`. ## Migrate to SQL compliant mode @@ -382,5 +372,5 @@ Druid returns the following: See the following topics for more information: - [Null handling tutorial](../tutorials/tutorial-sql-null.md) to learn how the default null handling works in Druid. - - [Null values](../querying/sql-data-types.md#null-values) for a description of Druid's behavior with null values. + - [Null values](../querying/sql-data-types.md#null-values) for a description of Druid's null values. - [Handling null values](../design/segments.md#handling-null-values) for details about how Druid stores null values. \ No newline at end of file diff --git a/docs/release-info/upgrade-notes.md b/docs/release-info/upgrade-notes.md index d3c6ce1cba3..564dfa75246 100644 --- a/docs/release-info/upgrade-notes.md +++ b/docs/release-info/upgrade-notes.md @@ -30,7 +30,7 @@ For the full release notes for a specific version, see the [releases page](https #### Front-coded dictionaries -In Druid 32.0.0, the front coded dictionaries feature will be turned on by default. Front-coded dictionaries reduce storage and improve performance by optimizing for strings where the front part looks similar. +Front-coded dictionaries reduce storage and improve performance by optimizing for strings where the front part looks similar. Once this feature is on, you cannot easily downgrade to an earlier version that does not support the feature. diff --git a/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java b/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java index 4b5fd885256..2e578b24dc5 100644 --- a/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java +++ b/processing/src/main/java/org/apache/druid/common/config/NullValueHandlingConfig.java @@ -21,7 +21,6 @@ package org.apache.druid.common.config; import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonProperty; -import org.apache.druid.java.util.common.StringUtils; import org.apache.druid.java.util.common.logger.Logger; public class NullValueHandlingConfig @@ -71,23 +70,5 @@ public class NullValueHandlingConfig this.ignoreNullsForStringCardinality = false; } } - String version = NullValueHandlingConfig.class.getPackage().getImplementationVersion(); - if (version == null || version.contains("SNAPSHOT")) { - version = "latest"; - } - final String docsBaseFormat = "https://druid.apache.org/docs/%s/querying/sql-data-types#%s"; - - if (this.useDefaultValuesForNull) { - LOG.warn( - "druid.generic.useDefaultValueForNull set to 'true', but has been removed and is always 'false' for the most SQL compliant behavior, see %s for details", - StringUtils.format(docsBaseFormat, version, "null-values") - ); - } - if (!this.useThreeValueLogicForNativeFilters) { - LOG.warn( - "druid.generic.useThreeValueLogicForNativeFilters set to 'false', but has been removed and is always 'true' for the most SQL compliant behavior, see %s for details", - StringUtils.format(docsBaseFormat, version, "boolean-logic") - ); - } } } diff --git a/processing/src/main/java/org/apache/druid/guice/StartupInjectorBuilder.java b/processing/src/main/java/org/apache/druid/guice/StartupInjectorBuilder.java index e091e6eb522..bdf73993efa 100644 --- a/processing/src/main/java/org/apache/druid/guice/StartupInjectorBuilder.java +++ b/processing/src/main/java/org/apache/druid/guice/StartupInjectorBuilder.java @@ -19,8 +19,15 @@ package org.apache.druid.guice; +import com.google.common.annotations.VisibleForTesting; +import com.google.inject.Inject; +import com.google.inject.Injector; import com.google.inject.util.Providers; +import org.apache.druid.common.config.NullValueHandlingConfig; import org.apache.druid.jackson.JacksonModule; +import org.apache.druid.java.util.common.ISE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.ExpressionProcessingConfig; import org.apache.druid.math.expr.ExpressionProcessingModule; import org.apache.druid.utils.RuntimeInfo; @@ -52,10 +59,19 @@ public class StartupInjectorBuilder extends BaseInjectorBuilder<StartupInjectorB new ConfigModule(), new NullHandlingModule(), new ExpressionProcessingModule(), - binder -> binder.bind(DruidSecondaryModule.class) + binder -> binder.bind(DruidSecondaryModule.class), + binder -> binder.bind(PropertiesValidator.class) // this gets properties injected, later call to validate checks ); } + @Override + public Injector build() + { + Injector injector = super.build(); + injector.getInstance(PropertiesValidator.class).validate(); + return injector; + } + public StartupInjectorBuilder withProperties(Properties properties) { add(binder -> binder.bind(Properties.class).toInstance(properties)); @@ -97,4 +113,68 @@ public class StartupInjectorBuilder extends BaseInjectorBuilder<StartupInjectorB }); return this; } + + /** + * Centralized validation of runtime.properties to allow checking for configuration which has been removed and + * alerting or failing fast as needed. + */ + private static final class PropertiesValidator + { + private final Properties properties; + + @Inject + public PropertiesValidator(Properties properties) + { + this.properties = properties; + } + + public void validate() + { + final boolean defaultValueMode = Boolean.parseBoolean( + properties.getProperty(NullValueHandlingConfig.NULL_HANDLING_CONFIG_STRING, "false") + ); + if (defaultValueMode) { + final String docsLink = StringUtils.format("https://druid.apache.org/docs/%s/release-info/migr-ansi-sql-null", getVersionString()); + throw new ISE( + "%s set to 'true', but has been removed, see %s for details for how to migrate to SQL compliant behavior", + NullValueHandlingConfig.NULL_HANDLING_CONFIG_STRING, + docsLink + ); + } + + final boolean no3vl = !Boolean.parseBoolean( + properties.getProperty(NullValueHandlingConfig.THREE_VALUE_LOGIC_CONFIG_STRING, "true") + ); + if (no3vl) { + final String docsLink = StringUtils.format("https://druid.apache.org/docs/%s/release-info/migr-ansi-sql-null", getVersionString()); + throw new ISE( + "%s set to 'false', but has been removed, see %s for details for how to migrate to SQL compliant behavior", + NullValueHandlingConfig.THREE_VALUE_LOGIC_CONFIG_STRING, + docsLink + ); + } + + final boolean nonStrictExpressions = !Boolean.parseBoolean( + properties.getProperty(ExpressionProcessingConfig.NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING, "true") + ); + if (nonStrictExpressions) { + final String docsLink = StringUtils.format("https://druid.apache.org/docs/%s/release-info/migr-ansi-sql-null", getVersionString()); + throw new ISE( + "%s set to 'false', but has been removed, see %s for details for how to migrate to SQL compliant behavior", + ExpressionProcessingConfig.NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING, + docsLink + ); + } + } + } + + @VisibleForTesting + public static String getVersionString() + { + final String version = StartupInjectorBuilder.class.getPackage().getImplementationVersion(); + if (version == null || version.contains("SNAPSHOT")) { + return "latest"; + } + return version; + } } diff --git a/processing/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java b/processing/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java index a0d58eb4806..712fac379c8 100644 --- a/processing/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java +++ b/processing/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java @@ -21,7 +21,6 @@ package org.apache.druid.math.expr; import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonProperty; -import org.apache.druid.java.util.common.StringUtils; import org.apache.druid.java.util.common.logger.Logger; import javax.annotation.Nullable; @@ -30,7 +29,6 @@ public class ExpressionProcessingConfig { private static final Logger LOG = new Logger(ExpressionProcessingConfig.class); - @Deprecated public static final String NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING = "druid.expressions.useStrictBooleans"; // Coerce arrays to multi value strings public static final String PROCESS_ARRAYS_AS_MULTIVALUE_STRINGS_CONFIG_STRING = @@ -78,17 +76,6 @@ public class ExpressionProcessingConfig allowVectorizeFallback, ALLOW_VECTORIZE_FALLBACK ); - String version = ExpressionProcessingConfig.class.getPackage().getImplementationVersion(); - if (version == null || version.contains("SNAPSHOT")) { - version = "latest"; - } - final String docsBaseFormat = "https://druid.apache.org/docs/%s/querying/sql-data-types#%s"; - if (!this.useStrictBooleans) { - LOG.warn( - "druid.expressions.useStrictBooleans set to 'false', but has been removed from Druid and is always 'true' now for the most SQL compliant behavior, see %s for details", - StringUtils.format(docsBaseFormat, version, "boolean-logic") - ); - } } public boolean processArraysAsMultiValueStrings() diff --git a/processing/src/test/java/org/apache/druid/guice/StartupInjectorBuilderTest.java b/processing/src/test/java/org/apache/druid/guice/StartupInjectorBuilderTest.java index a20cf33c7c0..892434defce 100644 --- a/processing/src/test/java/org/apache/druid/guice/StartupInjectorBuilderTest.java +++ b/processing/src/test/java/org/apache/druid/guice/StartupInjectorBuilderTest.java @@ -20,7 +20,12 @@ package org.apache.druid.guice; import com.google.inject.Injector; +import org.apache.druid.common.config.NullValueHandlingConfig; +import org.apache.druid.java.util.common.ISE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.ExpressionProcessingConfig; import org.apache.druid.utils.RuntimeInfo; +import org.junit.Assert; import org.junit.Test; import java.util.Collections; @@ -135,4 +140,58 @@ public class StartupInjectorBuilderTest } // Can't test the server option here: there are no actual property files to read. + + @Test + public void testValidator() + { + final Properties propsDefaultValueMode = new Properties(); + propsDefaultValueMode.put(NullValueHandlingConfig.NULL_HANDLING_CONFIG_STRING, "true"); + + Throwable t = Assert.assertThrows( + ISE.class, + () -> new StartupInjectorBuilder().withExtensions() + .withProperties(propsDefaultValueMode) + .build() + ); + Assert.assertEquals( + StringUtils.format( + "druid.generic.useDefaultValueForNull set to 'true', but has been removed, see https://druid.apache.org/docs/%s/release-info/migr-ansi-sql-null for details for how to migrate to SQL compliant behavior", + StartupInjectorBuilder.getVersionString() + ), + t.getMessage() + ); + + final Properties propsNo3vl = new Properties(); + propsNo3vl.put(NullValueHandlingConfig.THREE_VALUE_LOGIC_CONFIG_STRING, "false"); + t = Assert.assertThrows( + ISE.class, + () -> new StartupInjectorBuilder().withExtensions() + .withProperties(propsNo3vl) + .build() + ); + Assert.assertEquals( + StringUtils.format( + "druid.generic.useThreeValueLogicForNativeFilters set to 'false', but has been removed, see https://druid.apache.org/docs/%s/release-info/migr-ansi-sql-null for details for how to migrate to SQL compliant behavior", + StartupInjectorBuilder.getVersionString() + ), + t.getMessage() + ); + + final Properties propsNonStrictBooleans = new Properties(); + propsNonStrictBooleans.put(ExpressionProcessingConfig.NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING, "false"); + + t = Assert.assertThrows( + ISE.class, + () -> new StartupInjectorBuilder().withExtensions() + .withProperties(propsNonStrictBooleans) + .build() + ); + Assert.assertEquals( + StringUtils.format( + "druid.expressions.useStrictBooleans set to 'false', but has been removed, see https://druid.apache.org/docs/%s/release-info/migr-ansi-sql-null for details for how to migrate to SQL compliant behavior", + StartupInjectorBuilder.getVersionString() + ), + t.getMessage() + ); + } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org