clintropolis commented on code in PR #16673: URL: https://github.com/apache/druid/pull/16673#discussion_r1680091282
########## docs/querying/sql-data-types.md: ########## @@ -34,12 +34,23 @@ Druid associates each column with a specific data type. This topic describes sup Druid natively supports the following basic column types: -* LONG: 64-bit signed int -* FLOAT: 32-bit float -* DOUBLE: 64-bit float -* STRING: UTF-8 encoded strings and string arrays -* COMPLEX: non-standard data types, such as nested JSON, hyperUnique and approxHistogram, and DataSketches -* ARRAY: arrays composed of any of these types +* `LONG`: 64-bit signed int +* `FLOAT`: 32-bit float +* `DOUBLE`: 64-bit float +* `STRING`: UTF-8 encoded strings and string arrays +* `ARRAY`: arrays composed of any of these types + +## Complex types + +Druid natively supports the following complex types: +* `COMPLEX<JSON>`: stores a copy of structured data in JSON format and specialized internal columns and indexes for nested basic types. Click here to learn more about [`COMPLEX<JSON>`](nested-columns.md) +* `cardinality`: Data structure to compute the cardinality of Apache Druid dimensions using the HyperLogLog algorithm. Click here to learn more about [`cardinality`](hll-old.md#cardinality-aggregator) +* `hyperUnique`: Data structure of aggregated values to estimate count distinct using a variant of the HyperLogLog approximation algorithm. Consider using HLL sketches for better accuracy in many cases. Click here to learn more about [`hyperUnique`](hll-old.md#hyperunique-aggregator) Review Comment: if we are going to mention json as `COMPLEX<json>`, then this should probably be `COMPLEX<hyperUnique>`? Also its strange to tell someone to use something and not also include a description of that type... I'm not sure how we should handle extension `COMPLEX` types here, since there are a lot of them and I don't know that their docs do a very good job of articulating what type they store is if they are ingested into a rollup table ########## docs/querying/sql-data-types.md: ########## @@ -34,12 +34,23 @@ Druid associates each column with a specific data type. This topic describes sup Druid natively supports the following basic column types: -* LONG: 64-bit signed int -* FLOAT: 32-bit float -* DOUBLE: 64-bit float -* STRING: UTF-8 encoded strings and string arrays -* COMPLEX: non-standard data types, such as nested JSON, hyperUnique and approxHistogram, and DataSketches -* ARRAY: arrays composed of any of these types +* `LONG`: 64-bit signed int +* `FLOAT`: 32-bit float +* `DOUBLE`: 64-bit float +* `STRING`: UTF-8 encoded strings and string arrays +* `ARRAY`: arrays composed of any of these types + +## Complex types + +Druid natively supports the following complex types: +* `COMPLEX<JSON>`: stores a copy of structured data in JSON format and specialized internal columns and indexes for nested basic types. Click here to learn more about [`COMPLEX<JSON>`](nested-columns.md) +* `cardinality`: Data structure to compute the cardinality of Apache Druid dimensions using the HyperLogLog algorithm. Click here to learn more about [`cardinality`](hll-old.md#cardinality-aggregator) Review Comment: cardinality is an aggregator type, not a column type. it builds into a `COMPLEX<hyperUnique>` if stored in a column ########## docs/querying/sql-data-types.md: ########## @@ -64,7 +75,7 @@ The following table describes how Druid maps SQL types onto native types when ru |TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, such as `2000-01-02 03:04:05`, not ISO 8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).| |DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting—for example, `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).| |ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See [Arrays](#arrays) for more information.| -|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.| +|OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, cardinality, etc.| Review Comment: cardinality isn't a type, this should mention that it is dependent on which extensions are loaded and link to the extension docs. However, afaik none of the extension docs contain information on how the type is displayed in like `INFORMATION_SCHEMA` columns table, so those docs might also need updated to indicate how their types are presented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
