paul-rogers commented on PR #12851: URL: https://github.com/apache/druid/pull/12851#issuecomment-1203345602
@clintropolis, thanks for this fix, you've done some great foundation work that the catalog project will build upon. The catalog is presently focused on aggregates, but there seems to be some interesting synergy with the JSON columns. Let's think a bit about the actual type names. There are three types to consider: * The input type (what can go into a complex column such as JSON or an aggregate) * The storage type (in segments) * The output type of queries (after "finalizing" the aggregates). The current (very preliminary) thinking is for the catalog to use an "aggregate type" that describes all of the above (and the "seed", "combiner" and "reducer" functions.) The storage type for an aggregate is, I gather, either a simple type (`long` for `SUM(<long value>)` or a complex type (for things like histograms.) We want the SQL type to describe the aggregate itself, not just the storage format. So, we've been playing with type names such as `SUM(BIGINT)` or `HISTOGRAM(VARCHAR)`. The idea is that the type name acts as a key into a table that gives is the storage type, the "seed" function (to convert a single row to an aggregate), the "combiner" function (used to combine aggregates), and a "reducer" (used to convert the aggregate to a simple value returned by queries.) The idea is that the user provides the type name (in whatever syntax we decide on), and the system then works out all the internal bits for both `INSERT` and `SELECT` statements. The existing `COMPLEX<foo>` pattern doesn't quite capture the info we need, hence the idea of `<agg>(<type>)`, including multi-argument forms. Returning to JSON, I wonder if the same ideas apply? Looking at how other products handle JSON, [SqlServer](https://docs.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16) uses type `VARCHAR`. [MySQL](https://dev.mysql.com/doc/refman/8.0/en/json.html) uses type `JSON`: ```sql CREATE TABLE t1 (jdoc JSON) ``` Druid has no `CREATE TABLE` statement, but the catalog will bring us a step closer: we need a type in the catalog that says that column `jdoc` holds JSON data. The suggestion here is to maybe use `COMPLEX<json>`. The hint from MySQL is to just use `JSON`. There is mention in the description of the row signature of a `SELECT`. This is also an interesting case. For that, we might want to learn from SqlServer. If what we return is a string, then the query output type should be a simple `VARCHAR` since the return value looks like a `VARCHAR`, quacks like a `VARCHAR` and actually *is* a `VARCHAR` to JDBC. That is, the "reduced" form of JSON is just a `VARCHAR`: it looses all its special "JSON-ness" when returned to the client. Hence, the planner has to know that, when selecting a column of type `JSON` (or `COMPLEX<json>`), the output type is `VARCHAR`. All of that is probably a bit outside the scope of this particular PR, but it seems like we'll soon need to resolve these questions for both the catalog and JSON projects, and that solution will leverage the work you've done here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
