paul-rogers commented on PR #13627: URL: https://github.com/apache/druid/pull/13627#issuecomment-1371761837
To anyone who wants to review, here is a summary of an overview of this PR provided via a private channel. Let's assume you are already familiar with the model layer from last time. The key changes here are to make table functions more generic. First, there is a table function abstraction: something that has a set of parameter definitions, accepts a set of arguments, and returns an external table definition. The table function replaces the property annotations from the prior PR, so the annotation stuff was removed. Before, there was a definition for each kind of external table. This was too restrictive. So, now there is a generic external table definition that has two "parts". One is a definition of the input source, the other is a list of possible input formats. Each input source has its own metadata class which says which properties are available (as before), but also says what table functions are available. There three kind of functions. 1) An "ad-hoc" (from scratch) function that is a fancy form of the existing extern function. 2) a "partial" function where some of the data resides in the catalog, the rest is given by the query using the table name as a function name. 3) a "complete" function where the catalog table name can be used either as a table or as a zero-argument table function. The input source metadata specifies the various properties, how function arguments are mapped to properties, and how to merge catalog and argument values. Formats are similar, but are shared by all input sources. Given this, the second part is the SQL layer table functions are redone to use the table function abstraction and the input source definition. Basically, the HTTP input source function holds onto the HTTP input source definition, and asks that definition for the table function definition, which then provide the list of parameters and process the actual call. Given that, the related code is kind of one big thing: it won't work to use, say, the old Calcite code with the new model code or visa-versa. You can start here: `server/src/main/java/org/apache/druid/catalog/model/table`. That's the core. Then, spiral outwards. `server/src/main/java/org/apache/druid/catalog/model` has a bunch of supporting changes. `sql/src/main/java/org/apache/druid/sql/calcite/external` is the Calcite integration for the table functions. Most of this stuff is either a complete mystery (because you don't know the bizarre ways that Calcite handles table macros) or trivial (because it builds on the code mentioned above.) Another big wad of files are the corresponding tests. Oh, and if you want to know what all this code actually does, look at the `reference.md` documentation file. A follow-on PR will add catalog integration to the Calcite planner so that we incorporate catalog information when planning an MSQ query. There are some bits toward that goal in this PR (where they overlap with external tables). However, to keep this PR from growing bigger; the planner work, which uses table specs from the catalog, comes later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
