paul-rogers opened a new pull request, #13686:
URL: https://github.com/apache/druid/pull/13686
This PR is currently a draft. Resolving merge conflicts after splitting out
some of the code to other PRs.
Prior PRs added the catalog (table metadata) foundations, and an improved
set of table functions. This PR brings it all together:
* Validates the MSQ INSERT and REPLACE statements against the catalog
* Clustering, partitioning and other table details can be set in the
catalog instead of the SQL statement
* Catalog types are loosely enforced for MSQ. (More work is needed to
precisely enforce types.)
* The catalog can create a "sealed" table: only columns defined in the
catalog can be used in MSQ.
* Allows defining external tables and partial external tables (AKA
"connections") in the catalog, then fill in the remaining details at runtime
via a table function.
* Allows parameters (including array parameters) to work with MSQ queries
* Extends the PARTITION BY clause to accept string literals for the time
partitioning
* Extends MSQ to give the planner control over the type of the emitted
segment columns
* MSQ ITs to validate the new "ad-hoc" table functions
* Documentation
To allow all the above to work:
* Validation for MSQ statements moves out of the handers into a
Druid-specific version of the SQL validator.
* Druid-specific Calcite operator to represent a Druid ingest.
* The catalog API is passed into the Druid planner (which required changes
in the many tests that set up the planner).
* The catalog can now be enabled in the Broker to allow the planner to
interact with the Druid table metadata extension.
* Many new tests to verify the catalog integration and improved MSQ
statement validation.
* Improved catalog type parsing in anticipation of supporting complex types.
* Factored out the "per run" items from the planner into a planner toolbox,
leaving just the "per session" items in the planner.
* Resource shuttle now handles "partial table functions" for items defined
in the catalog.
#### Release note
This PR introduces the full catalog functionality. See the documentation
files for the details. In this version, the catalog is an extension: you must
enable the catalog extension to use the catalog. Enabling the extension creates
an additional table in your metadata database. We consider the catalog to be
experimental, and the metadata table schema is subject to change.
Table functions, introduced in a prior PR, are production ready and
independent of the catalog. "Partial table functions" (define some of the
properties in the catalog, some in SQL) are new in this PR and are
experimental, along with the catalog itself.
#### Hints to reviewers
Much of this PR is doc files, test code and minor cleanup. The core changes
(those that could break a running system if done wrong) are:
* `extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/*`
* `sql/src/main/*`
The real core of this PR is
`sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java`:
the place where we moved the former ad-hoc `INSERT` and `REPLACE` validation
to instead run within the SQL validator.
No runtime code was changed: all the non-trivial changes are in the SQL
planner.
<hr>
This PR has:
- [X] been self-reviewed.
- [X] added documentation for new or modified features or behaviors.
- [X] a release note entry in the PR description.
- [X] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [X] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]