paul-rogers opened a new pull request, #13686:
URL: https://github.com/apache/druid/pull/13686

   This PR is currently a draft. Resolving merge conflicts after splitting out 
some of the code to other PRs.
   
   Prior PRs added the catalog (table metadata) foundations, and an improved 
set of table functions. This PR brings it all together:
   
   * Validates the MSQ INSERT and REPLACE statements against the catalog
     * Clustering, partitioning and other table details can be set in the 
catalog instead of the SQL statement
     * Catalog types are loosely enforced for MSQ. (More work is needed to 
precisely enforce types.)
     * The catalog can create a "sealed" table: only columns defined in the 
catalog can be used in MSQ.
   * Allows defining external tables and partial external tables (AKA 
"connections") in the catalog, then fill in the remaining details at runtime 
via a table function.
   * Allows parameters (including array parameters) to work with MSQ queries
   * Extends the PARTITION BY clause to accept string literals for the time 
partitioning
   * Extends MSQ to give the planner control over the type of the emitted 
segment columns
   * MSQ ITs to validate the new "ad-hoc" table functions
   * Documentation
   
   To allow all the above to work:
   
   * Validation for MSQ statements moves out of the handers into a 
Druid-specific version of the SQL validator.
   * Druid-specific Calcite operator to represent a Druid ingest.
   * The catalog API is passed into the Druid planner (which required changes 
in the many tests that set up the planner).
   * The catalog can now be enabled in the Broker to allow the planner to 
interact with the Druid table metadata extension.
   * Many new tests to verify the catalog integration and improved MSQ 
statement validation.
   * Improved catalog type parsing in anticipation of supporting complex types.
   * Factored out the "per run" items from the planner into a planner toolbox, 
leaving just the "per session" items in the planner.
   * Resource shuttle now handles "partial table functions" for items defined 
in the catalog.
   
   #### Release note
   
   This PR introduces the full catalog functionality. See the documentation 
files for the details. In this version, the catalog is an extension: you must 
enable the catalog extension to use the catalog. Enabling the extension creates 
an additional table in your metadata database. We consider the catalog to be 
experimental, and the metadata table schema is subject to change.
   
   Table functions, introduced in a prior PR, are production ready and 
independent of the catalog. "Partial table functions" (define some of the 
properties in the catalog, some in SQL) are new in this PR and are 
experimental, along with the catalog itself.
   
   #### Hints to reviewers
   
   Much of this PR is doc files, test code and minor cleanup. The core changes 
(those that could break a running system if done wrong) are:
   
   * `extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/*`
   * `sql/src/main/*`
   
   The real core of this PR is 
`sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java`:
 the place where we moved the former ad-hoc `INSERT` and `REPLACE` validation 
to instead run within the SQL validator.
   
   No runtime code was changed: all the non-trivial changes are in the SQL 
planner.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added documentation for new or modified features or behaviors.
   - [X] a release note entry in the PR description.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to