[GitHub] [druid] gianm commented on a change in pull request #11188: SQL timeseries no longer skip empty buckets with all granularity

GitBox Thu, 06 May 2021 16:04:42 -0700


gianm commented on a change in pull request #11188:
URL: https://github.com/apache/druid/pull/11188#discussion_r627818592




##########
File path: docs/querying/sql.md
##########
@@ -313,48 +313,51 @@ possible for two aggregators in the same SQL query to 
have different filters.
 
 Only the COUNT and ARRAY_AGG aggregations can accept DISTINCT.
 
+When no rows are selected, aggregate functions will return their initialized 
value for the grouping they belong to. What this value is exactly for a given 
aggregator is dependent on the configuration of Druid's SQL compatible null 
handling mode, controlled by `druid.generic.useDefaultValueForNull`. The table 
below defines the initial values for all aggregate functions in both modes.

Review comment:
       Hmm this paragraph reads oddly to me for two reasons:
   
   - At first blush II think a typical user would think it's impossible for 
groups to exist that do not have any rows. (It isn't typical SQLy behavior.) So 
we should list some examples of when the default value will show up. I can 
think of two cases: grand total (aggregations with no `group by`) and filtered 
aggregators where the filter does not match any rows within the group.
   - Not all aggregators have behavior dependent on 
`druid.generic.useDefaultValueForNull`, so it's not technically correct to say 
this categorically. I don't think we need to mention 
`druid.generic.useDefaultValueForNull` at all here, actually, because the 
individual aggregators in the table below call it out when appropriate. Or, if 
we do mention it, we could just say that it "may depend on" rather than "is 
dependent on".
   
   Welcome to opinions from others about how to express this most clearly.

##########
File path: docs/querying/sql.md
##########
@@ -313,48 +313,51 @@ possible for two aggregators in the same SQL query to 
have different filters.
 
 Only the COUNT and ARRAY_AGG aggregations can accept DISTINCT.
 
+When no rows are selected, aggregate functions will return their initialized 
value for the grouping they belong to. What this value is exactly for a given 
aggregator is dependent on the configuration of Druid's SQL compatible null 
handling mode, controlled by `druid.generic.useDefaultValueForNull`. The table 
below defines the initial values for all aggregate functions in both modes.
+
 > The order of aggregation operations across segments is not deterministic. 
 > This means that non-commutative aggregation
 > functions can produce inconsistent results across the same query. 
 >
 > Functions that operate on an input type of "float" or "double" may also see 
 > these differences in aggregation
 > results across multiple query runs because of this. If precisely the same 
 > value is desired across multiple query runs,
 > consider using the `ROUND` function to smooth out the inconsistencies 
 > between queries.  
 
-|Function|Notes|
-|--------|-----|
-|`COUNT(*)`|Counts the number of rows.|
-|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, 
numeric, or hyperUnique. By default this is approximate, using a variant of 
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To 
get exact counts set "useApproximateCountDistinct" to "false". If you do this, 
expr must be string or numeric, since exact counts are not possible using 
hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, 
only one distinct count per query is permitted unless 
`useGroupingSetForExactDistinct` is set to true in query contexts or broker 
configurations.|
-|`SUM(expr)`|Sums numbers.|
-|`MIN(expr)`|Takes the minimum of numbers.|
-|`MAX(expr)`|Takes the maximum of numbers.|
-|`AVG(expr)`|Averages numbers.|
-|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a 
regular column or a hyperUnique column. This is always approximate, regardless 
of the value of "useApproximateCountDistinct". This uses Druid's built-in 
"cardinality" or "hyperUnique" aggregators. See also `COUNT(DISTINCT expr)`.|
-|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct 
values of expr, which can be a regular column or an [HLL 
sketch](../development/extensions-core/datasketches-hll.md) column. The `lgK` 
and `tgtHllType` parameters are described in the HLL sketch documentation. This 
is always approximate, regardless of the value of 
"useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The 
[DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of 
expr, which can be a regular column or a [Theta 
sketch](../development/extensions-core/datasketches-theta.md) column. The 
`size` parameter is described in the Theta sketch documentation. This is always 
approximate, regardless of the value of "useApproximateCountDistinct". See also 
`COUNT(DISTINCT expr)`. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL 
sketch](../development/extensions-core/datasketches-hll.md) on the values of 
expr, which can be a regular column or a column containing HLL sketches. The 
`lgK` and `tgtHllType` parameters are described in the HLL sketch 
documentation. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`DS_THETA(expr, [size])`|Creates a [Theta 
sketch](../development/extensions-core/datasketches-theta.md) on the values of 
expr, which can be a regular column or a column containing Theta sketches. The 
`size` parameter is described in the Theta sketch documentation. The 
[DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate 
quantiles on numeric or 
[approxHistogram](../development/extensions-core/approximate-histograms.md#approximate-histogram-aggregator)
 exprs. The "probability" should be between 0 and 1 (exclusive). The 
"resolution" is the number of centroids to use for the computation. Higher 
resolutions will give more precise results but also have higher overhead. If 
not provided, the default resolution is 50. The [approximate histogram 
extension](../development/extensions-core/approximate-histograms.md) must be 
loaded to use this function.|
-|`APPROX_QUANTILE_DS(expr, probability, [k])`|Computes approximate quantiles 
on numeric or [Quantiles 
sketch](../development/extensions-core/datasketches-quantiles.md) exprs. The 
"probability" should be between 0 and 1 (exclusive). The `k` parameter is 
described in the Quantiles sketch documentation. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, 
upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric 
or [fixed buckets 
histogram](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram)
 exprs. The "probability" should be between 0 and 1 (exclusive). The 
`numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters 
are described in the fixed buckets histogram documentation. The [approximate 
histogram extension](../development/extensions-core/approximate-histograms.md) 
must be loaded to use this function.|
-|`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles 
sketch](../development/extensions-core/datasketches-quantiles.md) on the values 
of expr, which can be a regular column or a column containing quantiles 
sketches. The `k` parameter is described in the Quantiles sketch documentation. 
The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced 
by `expr`, with `numEntries` maximum number of distinct values before false 
positive rate increases. See [bloom filter 
extension](../development/extensions-core/bloom-filter.md) documentation for 
additional details.|
-|`TDIGEST_QUANTILE(expr, quantileFraction, [compression])`|Builds a T-Digest 
sketch on values produced by `expr` and returns the value for the quantile. 
Compression parameter (default value 100) determines the accuracy and size of 
the sketch. Higher compression means higher accuracy but more space to store 
sketches. See [t-digest 
extension](../development/extensions-contrib/tdigestsketch-quantiles.md) 
documentation for additional details.|
-|`TDIGEST_GENERATE_SKETCH(expr, [compression])`|Builds a T-Digest sketch on 
values produced by `expr`. Compression parameter (default value 100) determines 
the accuracy and size of the sketch Higher compression means higher accuracy 
but more space to store sketches. See [t-digest 
extension](../development/extensions-contrib/tdigestsketch-quantiles.md) 
documentation for additional details.|
-|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See 
[stats extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`EARLIEST(expr)`|Returns the earliest value of `expr`, which must be numeric. 
If `expr` comes from a relation with a timestamp column (like a Druid 
datasource) then "earliest" is the value first encountered with the minimum 
overall timestamp of all values being aggregated. If `expr` does not come from 
a relation with a timestamp, then it is simply the first value encountered.|
-|`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
-|`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If 
`expr` comes from a relation with a timestamp column (like a Druid datasource) 
then "latest" is the value last encountered with the maximum overall timestamp 
of all values being aggregated. If `expr` does not come from a relation with a 
timestamp, then it is simply the last value encountered.|
-|`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit will be truncated. This parameter 
should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be 
numeric. This aggregator can simplify and optimize the performance by returning 
the first encountered value (including null)|
-|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
-|`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy 
dimension is included in a row, when using `GROUPING SETS`. Refer to 
[additional documentation](aggregations.md#grouping-aggregator) on how to infer 
this number.|
-|`ARRAY_AGG(expr, [size])`|Collects all values of `expr` into an ARRAY, 
including null values, with `size` in bytes limit on aggregation size (default 
of 1024 bytes). Use of `ORDER BY` within the `ARRAY_AGG` expression is not 
currently supported, and the ordering of results within the output array may 
vary depending on processing order.|
-|`ARRAY_AGG(DISTINCT expr, [size])`|Collects all distinct values of `expr` 
into an ARRAY, including null values, with `size` in bytes limit on aggregation 
size (default of 1024 bytes) per aggregate. Use of `ORDER BY` within the 
`ARRAY_AGG` expression is not currently supported, and the ordering of results 
within the output array may vary depending on processing order.|
+|Function|Notes|Default|
+|--------|-----|-------|
+|`COUNT(*)`|Counts the number of rows.|`0`|
+|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, 
numeric, or hyperUnique. By default this is approximate, using a variant of 
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To 
get exact counts set "useApproximateCountDistinct" to "false". If you do this, 
expr must be string or numeric, since exact counts are not possible using 
hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, 
only one distinct count per query is permitted unless 
`useGroupingSetForExactDistinct` is set to true in query contexts or broker 
configurations.|`0`|
+|`SUM(expr)`|Sums numbers.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `0`|
+|`MIN(expr)`|Takes the minimum of numbers.|`null` if 
`druid.generic.useDefaultValueForNull=false`, otherwise `9223372036854775807` 
(maximum LONG value)|
+|`MAX(expr)`|Takes the maximum of numbers.|`0` in 'default' mode, `null` in 
SQL compatible mode|

Review comment:
       Missed this one?

##########
File path: docs/querying/sql.md
##########
@@ -313,46 +313,48 @@ possible for two aggregators in the same SQL query to 
have different filters.
 
 Only the COUNT aggregation can accept DISTINCT.
 
+When no rows are selected, aggregate functions will return their initialized 
value for the grouping they belong to. What this value is exactly for a given 
aggregator is dependent on the configuration of Druid's SQL compatible null 
handling mode, controlled by `druid.generic.useDefaultValueForNull`. The table 
below defines the initial values for all aggregate functions in both modes.
+
 > The order of aggregation operations across segments is not deterministic. 
 > This means that non-commutative aggregation
 > functions can produce inconsistent results across the same query. 
 >
 > Functions that operate on an input type of "float" or "double" may also see 
 > these differences in aggregation
 > results across multiple query runs because of this. If precisely the same 
 > value is desired across multiple query runs,
 > consider using the `ROUND` function to smooth out the inconsistencies 
 > between queries.  
 
-|Function|Notes|
-|--------|-----|
-|`COUNT(*)`|Counts the number of rows.|
-|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, 
numeric, or hyperUnique. By default this is approximate, using a variant of 
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To 
get exact counts set "useApproximateCountDistinct" to "false". If you do this, 
expr must be string or numeric, since exact counts are not possible using 
hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, 
only one distinct count per query is permitted unless 
`useGroupingSetForExactDistinct` is set to true in query contexts or broker 
configurations.|
-|`SUM(expr)`|Sums numbers.|
-|`MIN(expr)`|Takes the minimum of numbers.|
-|`MAX(expr)`|Takes the maximum of numbers.|
-|`AVG(expr)`|Averages numbers.|
-|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a 
regular column or a hyperUnique column. This is always approximate, regardless 
of the value of "useApproximateCountDistinct". This uses Druid's built-in 
"cardinality" or "hyperUnique" aggregators. See also `COUNT(DISTINCT expr)`.|
-|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct 
values of expr, which can be a regular column or an [HLL 
sketch](../development/extensions-core/datasketches-hll.md) column. The `lgK` 
and `tgtHllType` parameters are described in the HLL sketch documentation. This 
is always approximate, regardless of the value of 
"useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The 
[DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of 
expr, which can be a regular column or a [Theta 
sketch](../development/extensions-core/datasketches-theta.md) column. The 
`size` parameter is described in the Theta sketch documentation. This is always 
approximate, regardless of the value of "useApproximateCountDistinct". See also 
`COUNT(DISTINCT expr)`. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL 
sketch](../development/extensions-core/datasketches-hll.md) on the values of 
expr, which can be a regular column or a column containing HLL sketches. The 
`lgK` and `tgtHllType` parameters are described in the HLL sketch 
documentation. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`DS_THETA(expr, [size])`|Creates a [Theta 
sketch](../development/extensions-core/datasketches-theta.md) on the values of 
expr, which can be a regular column or a column containing Theta sketches. The 
`size` parameter is described in the Theta sketch documentation. The 
[DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate 
quantiles on numeric or 
[approxHistogram](../development/extensions-core/approximate-histograms.md#approximate-histogram-aggregator)
 exprs. The "probability" should be between 0 and 1 (exclusive). The 
"resolution" is the number of centroids to use for the computation. Higher 
resolutions will give more precise results but also have higher overhead. If 
not provided, the default resolution is 50. The [approximate histogram 
extension](../development/extensions-core/approximate-histograms.md) must be 
loaded to use this function.|
-|`APPROX_QUANTILE_DS(expr, probability, [k])`|Computes approximate quantiles 
on numeric or [Quantiles 
sketch](../development/extensions-core/datasketches-quantiles.md) exprs. The 
"probability" should be between 0 and 1 (exclusive). The `k` parameter is 
described in the Quantiles sketch documentation. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, 
upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric 
or [fixed buckets 
histogram](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram)
 exprs. The "probability" should be between 0 and 1 (exclusive). The 
`numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters 
are described in the fixed buckets histogram documentation. The [approximate 
histogram extension](../development/extensions-core/approximate-histograms.md) 
must be loaded to use this function.|
-|`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles 
sketch](../development/extensions-core/datasketches-quantiles.md) on the values 
of expr, which can be a regular column or a column containing quantiles 
sketches. The `k` parameter is described in the Quantiles sketch documentation. 
The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|
-|`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced 
by `expr`, with `numEntries` maximum number of distinct values before false 
positive rate increases. See [bloom filter 
extension](../development/extensions-core/bloom-filter.md) documentation for 
additional details.|
-|`TDIGEST_QUANTILE(expr, quantileFraction, [compression])`|Builds a T-Digest 
sketch on values produced by `expr` and returns the value for the quantile. 
Compression parameter (default value 100) determines the accuracy and size of 
the sketch. Higher compression means higher accuracy but more space to store 
sketches. See [t-digest 
extension](../development/extensions-contrib/tdigestsketch-quantiles.md) 
documentation for additional details.|
-|`TDIGEST_GENERATE_SKETCH(expr, [compression])`|Builds a T-Digest sketch on 
values produced by `expr`. Compression parameter (default value 100) determines 
the accuracy and size of the sketch Higher compression means higher accuracy 
but more space to store sketches. See [t-digest 
extension](../development/extensions-contrib/tdigestsketch-quantiles.md) 
documentation for additional details.|
-|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See 
[stats extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats 
extension](../development/extensions-core/stats.md) documentation for 
additional details.|
-|`EARLIEST(expr)`|Returns the earliest value of `expr`, which must be numeric. 
If `expr` comes from a relation with a timestamp column (like a Druid 
datasource) then "earliest" is the value first encountered with the minimum 
overall timestamp of all values being aggregated. If `expr` does not come from 
a relation with a timestamp, then it is simply the first value encountered.|
-|`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
-|`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If 
`expr` comes from a relation with a timestamp column (like a Druid datasource) 
then "latest" is the value last encountered with the maximum overall timestamp 
of all values being aggregated. If `expr` does not come from a relation with a 
timestamp, then it is simply the last value encountered.|
-|`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit will be truncated. This parameter 
should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be 
numeric. This aggregator can simplify and optimize the performance by returning 
the first encountered value (including null)|
-|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
-|`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy 
dimension is included in a row, when using `GROUPING SETS`. Refer to 
[additional documentation](aggregations.md#grouping-aggregator) on how to infer 
this number.|
+|Function|Notes|Default|
+|--------|-----|-------|
+|`COUNT(*)`|Counts the number of rows.|`0`|
+|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, 
numeric, or hyperUnique. By default this is approximate, using a variant of 
[HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To 
get exact counts set "useApproximateCountDistinct" to "false". If you do this, 
expr must be string or numeric, since exact counts are not possible using 
hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, 
only one distinct count per query is permitted unless 
`useGroupingSetForExactDistinct` is set to true in query contexts or broker 
configurations.|`0`|
+|`SUM(expr)`|Sums numbers.|`0` in 'default' mode, `null` in SQL compatible 
mode|
+|`MIN(expr)`|Takes the minimum of numbers.|`Long.MAX_VALUE` in 'default' mode, 
`null` in SQL compatible mode|
+|`MAX(expr)`|Takes the maximum of numbers.|`Long.MIN_VALUE` in 'default' mode, 
`null` in SQL compatible mode|
+|`AVG(expr)`|Averages numbers.|`0` in 'default' mode, `null` in SQL compatible 
mode|
+|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a 
regular column or a hyperUnique column. This is always approximate, regardless 
of the value of "useApproximateCountDistinct". This uses Druid's built-in 
"cardinality" or "hyperUnique" aggregators. See also `COUNT(DISTINCT 
expr)`.|`0`|
+|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct 
values of expr, which can be a regular column or an [HLL 
sketch](../development/extensions-core/datasketches-hll.md) column. The `lgK` 
and `tgtHllType` parameters are described in the HLL sketch documentation. This 
is always approximate, regardless of the value of 
"useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The 
[DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|`0`|
+|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of 
expr, which can be a regular column or a [Theta 
sketch](../development/extensions-core/datasketches-theta.md) column. The 
`size` parameter is described in the Theta sketch documentation. This is always 
approximate, regardless of the value of "useApproximateCountDistinct". See also 
`COUNT(DISTINCT expr)`. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|`0`|
+|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL 
sketch](../development/extensions-core/datasketches-hll.md) on the values of 
expr, which can be a regular column or a column containing HLL sketches. The 
`lgK` and `tgtHllType` parameters are described in the HLL sketch 
documentation. The [DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|`'0'` (STRING)|
+|`DS_THETA(expr, [size])`|Creates a [Theta 
sketch](../development/extensions-core/datasketches-theta.md) on the values of 
expr, which can be a regular column or a column containing Theta sketches. The 
`size` parameter is described in the Theta sketch documentation. The 
[DataSketches 
extension](../development/extensions-core/datasketches-extension.md) must be 
loaded to use this function.|`'0.0'` (STRING)|

Review comment:
       > Hmm, it actually returns a double, but we don't examine the finalized 
type so calcite thinks it is complex which is I guess how it ends up as a 
string instead of double because it just tries to serialize complex values (and 
I assume the same is true of the other sketches that return a string result).
   
   Hmm, weird, but, OK. We might want to change it to something saner later, 
but we don't have to do that right now.
   
   > I guess this also brings up the question of if we need to describe the 
difference between intermediary types and finalized types here
   
   I think the way you did it is right.
   
   I don't think we need to describe the intermediary / finalized type 
difference in user-facing documentation. That should be an internal detail. IMO 
if there's cases where users might benefit from seeing the non-finalized 
values, it'd be better to expose them through postaggregators that have well 
defined semantics (like sketch_bytes_as_base64 or something).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on a change in pull request #11188: SQL timeseries no longer skip empty buckets with all granularity

Reply via email to