paul-rogers opened a new pull request, #13165: URL: https://github.com/apache/druid/pull/13165
The [Druid catalog](https://github.com/apache/druid/issues/12546) provides a collection of metadata "hints" about tables (datasources, input sources, views, etc.) within Druid. This PR provides the foundation: the DB and REST layer, but not yet the integration with the Calcite SQL layer. This is a much-refined version of the [earlier catalog PR](https://github.com/apache/druid/pull/12647). The DB layer extends what is done for other Druid metadata tables. The semantic ("business logic") layer provides the usual CRUD operations on tables. The entire design is pretty standard and follows Druid patterns. The key difference is the rather extreme lengths taken by the implementation to ensure each bit is easily testable without mocks. That means many interfaces which can be implemented in multiple ways. Parts available in this PR include: * The metadata DB storage layer (in an extension) * The basic "catalog object model" that describes the properties and columns which describe catalog tables. * A basic set of tables: two kinds of datasources (detail and rollup) and three kinds of external tables (inline, local, and HTTP). * A REST API layer to perform CRUD operations on tables. * Unit tests * An integration test of the catalog REST API. The catalog mechanism is split into two parts. * The "core" part which describes catalog objects, and which is can model data from a variety of catalog systems. * The `druid-catalog` extension which stores data in the Druid metadata database. This split exists for two reasons: * Many metadata systems exist: HMS, Amazon Glue, various commercial solutions, etc. We anticipate that some shops may wish to obtain metadata from these other systems, in the same way that some shops get their security information from external systems. * Druid's database schema evolution system is rather basic. (And, by "basic", we mean "nonexistent.") There is some chance that the remaining development will change the schema, which upgrades cannot support. Users who enable `druid-catalog` extension now acknowledge that they are using it only for testing, not production, and at their own risk. Functionality not in this PR, but which will appear in the next one, includes: * The synchronization mechanism between the Coordinator and Broker. * SQL table functions to make use of catalog entries. * Integration of catalog properties to simplify MSQ ingest statements (`INSERT` and `REPLACE`). * Integration of catalog schema information with `SELECT` queries. * The remaining set of external table types. * Views. This is a great opportunity for reviewers to provide guidance on the basic catalog mechanism before we start building SQL integration on top. Please see the [Druid catalog issue](https://github.com/apache/druid/issues/12546) for additional details about the goals and design of the catalog. <hr> This PR has: - [X] been self-reviewed. - [X] has a design document [here](https://github.com/apache/druid/issues/12546). - [ ] added documentation for new or modified features or behaviors. (Not yet: the functionality is not yet user visible.) - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [X] added integration tests. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
