This is an automated email from the ASF dual-hosted git repository.
vogievetsky pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 7f3c117e3a SQL: Improve docs around casts. (#13466)
7f3c117e3a is described below
commit 7f3c117e3ab0971dd52180ac99d4da6cc092b598
Author: Gian Merlino <[email protected]>
AuthorDate: Thu Dec 15 15:01:40 2022 -0800
SQL: Improve docs around casts. (#13466)
Main change: clarify that the "default value" for casts only applies if
druid.generic.useDefaultValueForNull = true.
Secondary change: adjust a bunch of wording from future to present tense.
---
docs/querying/sql-data-types.md | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md
index 11c8cda0fc..bb1766d27e 100644
--- a/docs/querying/sql-data-types.md
+++ b/docs/querying/sql-data-types.md
@@ -40,14 +40,17 @@ milliseconds since 1970-01-01 00:00:00 UTC, not counting
leap seconds. Therefore
timezone information, but only carry information about the exact moment in
time they represent. See the
[Time functions](sql-scalar.md#date-and-time-functions) section for more
information about timestamp handling.
-The following table describes how Druid maps SQL types onto native types at
query runtime. Casts between two SQL types
-that have the same Druid runtime type will have no effect, other than
exceptions noted in the table. Casts between two
-SQL types that have different Druid runtime types will generate a runtime cast
in Druid. If a value cannot be properly
-cast to another value, as in `CAST('foo' AS BIGINT)`, the runtime will
substitute a default value. NULL values cast
-to non-nullable types will also be substituted with a default value (for
example, nulls cast to numbers will be
-converted to zeroes).
-
-|SQL type|Druid runtime type|Default value|Notes|
+Casts between two SQL types with the same Druid runtime type (see below table)
have no effect, other than exceptions
+noted in the table. Casts between two SQL types that have different Druid
runtime types generate a runtime cast in
+Druid. If a value cannot be cast to the target type, as in `CAST('foo' AS
BIGINT)`, Druid either substitutes a default
+value (when `druid.generic.useDefaultValueForNull = true`, the default mode),
or substitutes [NULL](#null-values) (when
+`druid.generic.useDefaultValueForNull = false`). NULL values cast to
non-nullable types are also substituted with a
+default value. For example, if `druid.generic.useDefaultValueForNull = true`,
a null VARCHAR cast to BIGINT is converted
+to a zero.
+
+The following table describes how Druid maps SQL types onto native types when
running queries.
+
+|SQL type|Druid runtime type|Default value<sup>*</sup>|Notes|
|--------|------------------|-------------|-----|
|CHAR|STRING|`''`||
|VARCHAR|STRING|`''`|Druid STRING columns are reported as VARCHAR. Can include
[multi-value strings](#multi-value-strings) as well.|
@@ -64,21 +67,23 @@ converted to zeroes).
|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the
timestamp to the nearest day. Casts between string and date types assume
standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one
of the [time functions](sql-scalar.md#date-and-time-functions).|
|OTHER|COMPLEX|none|May represent various Druid column types such as
hyperUnique, approxHistogram, etc.|
+<sup>*</sup> Default value applies if `druid.generic.useDefaultValueForNull =
true` (the default mode). Otherwise, the default value is `NULL` for all types.
+
## Multi-value strings
Druid's native type system allows strings to potentially have multiple values.
These
-[multi-value string dimensions](multi-value-dimensions.md) will be reported in
SQL as `VARCHAR` typed, and can be
-syntactically used like any other VARCHAR. Regular string functions that refer
to multi-value string dimensions will be
+[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL
as `VARCHAR` typed, and can be
+syntactically used like any other VARCHAR. Regular string functions that refer
to multi-value string dimensions are
applied to all values for each row individually. Multi-value string dimensions
can also be treated as arrays via special
[multi-value string functions](sql-multivalue-string-functions.md), which can
perform powerful array-aware operations.
-Grouping by a multi-value expression will observe the native Druid multi-value
aggregation behavior, which is similar to
+Grouping by a multi-value expression observes the native Druid multi-value
aggregation behavior, which is similar to
the `UNNEST` functionality available in some other SQL dialects. Refer to the
documentation on
[multi-value string dimensions](multi-value-dimensions.md) for additional
details.
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`,
> there are some inconsistencies between how
> they are handled in Druid SQL and in native queries. For example,
> expressions involving multi-value dimensions may be
-> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND
multi_val_dim = 'b'` will be optimized to
+> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND
multi_val_dim = 'b'` is optimized to
> `false`, even though it is possible for a single row to have both "a" and
> "b" as values for `multi_val_dim`. The
> SQL behavior of multi-value dimensions will change in a future release to
> more closely align with their behavior
> in native queries.
@@ -90,11 +95,11 @@ runtime property controls Druid's NULL handling mode. For
the most SQL compliant
When `druid.generic.useDefaultValueForNull = true` (the default mode), Druid
treats NULLs and empty strings
interchangeably, rather than according to the SQL standard. In this mode Druid
SQL only has partial support for NULLs.
-For example, the expressions `col IS NULL` and `col = ''` are equivalent, and
both will evaluate to true if `col`
-contains an empty string. Similarly, the expression `COALESCE(col1, col2)`
will return `col2` if `col1` is an empty
-string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)`
aggregator will count the number of rows
+For example, the expressions `col IS NULL` and `col = ''` are equivalent, and
both evaluate to true if `col`
+contains an empty string. Similarly, the expression `COALESCE(col1, col2)`
returns `col2` if `col1` is an empty
+string. While the `COUNT(*)` aggregator counts all rows, the `COUNT(expr)`
aggregator counts the number of rows
where `expr` is neither null nor the empty string. Numeric columns in this
mode are not nullable; any null or missing
-values will be treated as zeroes.
+values are treated as zeroes.
When `druid.generic.useDefaultValueForNull = false`, NULLs are treated more
closely to the SQL standard. In this mode,
numeric NULL is permitted, and NULLs and empty strings are no longer treated
as interchangeable. This property
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]