[GitHub] [druid] paul-rogers commented on pull request #13627: Much improved table functions

GitBox Wed, 04 Jan 2023 20:12:05 -0800


paul-rogers commented on PR #13627:
URL: https://github.com/apache/druid/pull/13627#issuecomment-1371761837


   To anyone who wants to review, here is a summary of an overview of this PR 
provided via a private channel.
   
   Let's assume you are already familiar with the model layer from last time. 
The key changes here are to make table functions more generic. First, there is 
a table function abstraction: something that has a set of parameter 
definitions, accepts a set of arguments, and returns an external table 
definition. The table function replaces the property annotations from the prior 
PR, so the annotation stuff was removed. Before, there was a definition for 
each kind of external table. This was too restrictive. So, now there is a 
generic external table definition that has two "parts". One is a definition of 
the input source, the other is a list of possible input formats.
   
   Each input source has its own metadata class which says which properties are 
available (as before), but also says what table functions are available. There 
three kind of functions. 1) An "ad-hoc" (from scratch) function that is a fancy 
form of the existing extern function. 2) a "partial" function where some of the 
data resides in the catalog, the rest is given by the query using the table 
name as a function name. 3) a "complete" function where the catalog table name 
can be used either as a table or as a zero-argument table function. The input 
source metadata specifies the various properties, how function arguments are 
mapped to properties, and how to merge catalog and argument values. Formats are 
similar, but are shared by all input sources.
   
   Given this, the second part is the SQL layer table functions are redone to 
use the table function abstraction and the input source definition. Basically, 
the HTTP input source function holds onto the HTTP input source definition, and 
asks that definition for the table function definition, which then provide the 
list of parameters and process the actual call. Given that, the related code is 
kind of one big thing: it won't work to use, say, the old Calcite code with the 
new model code or visa-versa.
   
   You can start here: 
`server/src/main/java/org/apache/druid/catalog/model/table`. That's the core. 
Then, spiral outwards. `server/src/main/java/org/apache/druid/catalog/model` 
has a bunch of supporting changes. 
`sql/src/main/java/org/apache/druid/sql/calcite/external` is the Calcite 
integration for the table functions. Most of this stuff is either a complete 
mystery (because you don't know the bizarre ways that Calcite handles table 
macros) or trivial (because it builds on the code mentioned above.) Another big 
wad of files are the corresponding tests.
   
   Oh, and if you want to know what all this code actually does, look at the 
`reference.md` documentation file.
   
   A follow-on PR will add catalog integration to the Calcite planner so that 
we incorporate catalog information when planning an MSQ query. There are some 
bits toward that goal in this PR (where they overlap with external tables). 
However, to keep this PR from growing bigger; the planner work, which uses 
table specs from the catalog, comes later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on pull request #13627: Much improved table functions

Reply via email to