Re: [PR] consolidate json and auto indexers, remove v4 nested column serializer (druid)

via GitHub Wed, 12 Jul 2023 13:00:45 -0700


ektravel commented on code in PR #14456:
URL: https://github.com/apache/druid/pull/14456#discussion_r1261667605



##########
docs/querying/nested-columns.md:
##########
@@ -23,12 +23,14 @@ sidebar_label: Nested columns
   ~ under the License.
   -->
 
-Apache Druid supports directly storing nested data structures in 
`COMPLEX<json>` columns. `COMPLEX<json>` columns store a copy of the structured 
data in JSON format and specialized internal columns and indexes for nested 
literal values&mdash;STRING, LONG, and DOUBLE types. An optimized [virtual 
column](./virtual-columns.md#nested-field-virtual-column) allows Druid to read 
and filter these values at speeds consistent with standard Druid LONG, DOUBLE, 
and STRING columns.
+Apache Druid supports directly storing nested data structures in 
`COMPLEX<json>` columns. `COMPLEX<json>` columns store a copy of the structured 
data in JSON format and specialized internal columns and indexes for nested 
literal values&mdash;STRING, LONG, and DOUBLE types, as well as ARRAY of 
STRING, LONG, and DOUBLE values. An optimized [virtual 
column](./virtual-columns.md#nested-field-virtual-column) allows Druid to read 
and filter these values at speeds consistent with standard Druid LONG, DOUBLE, 
and STRING columns.
 
 Druid [SQL JSON functions](./sql-json-functions.md) allow you to extract, 
transform, and create `COMPLEX<json>` values in SQL queries, using the 
specialized virtual columns where appropriate. You can use the [JSON nested 
columns functions](math-expr.md#json-functions) in [native 
queries](./querying.md) using [expression virtual 
columns](./virtual-columns.md#expression-virtual-column), and in native 
ingestion with a 
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec).
 
 You can use the JSON functions in INSERT and REPLACE statements in SQL-based 
ingestion, or in a `transformSpec` in native ingestion as an alternative to 
using a [`flattenSpec`](../ingestion/data-formats.md#flattenspec) object to 
"flatten" nested data for ingestion.
 
+Columns ingested as `COMPLEX<json>` are automatically optimized to store the 
most appropriate physical column based on the data processed. For example, if 
only LONG values are processed, Druid will store a LONG column, ARRAY columns 
if the data consists of arrays, or `COMPLEX<json>` in the general case if the 
data is actually nested. This is the same functionality that powers ['type 
aware' schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).

Review Comment:
   ```suggestion
   Columns ingested as `COMPLEX<json\>` are automatically optimized to store 
the most appropriate physical column based on the data processed. For example, 
if only LONG values are processed, Druid stores a LONG column, ARRAY columns if 
the data consists of arrays, or `COMPLEX<json\>` in the general case if the 
data is actually nested. This is the same functionality that powers ['type 
aware' schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] consolidate json and auto indexers, remove v4 nested column serializer (druid)

Reply via email to