paul-rogers commented on issue #12546:
URL: https://github.com/apache/druid/issues/12546#issuecomment-1182489512
## Extended Table Functions
The present version of Druid uses a Calcite feature to specify an ingest
input table:
```sql
INSERT INTO dst SELECT *
FROM TABLE(extern(
'{
"type": "inline",
"data": "a,b,1\nc,d,2\n"
}',
'{
"type": "csv",
"columns": ["x","y","z"],
"listDelimiter": null,
"findColumnsFromHeader": false,
"skipHeaderRows": 0
}',
'[
{"name": "x", "type": "STRING"},
{"name": "y", "type": "STRING"},
{"name": "z", "type": "LONG"}
]'
))
PARTITIONED BY ALL TIME
```
As it turns out, SQL (and Calcite) allow the use of named parameters. We can
rewrite the above as follows. Notice the `name => value` syntax:
```sql
INSERT INTO dst SELECT *
FROM TABLE(extern(
inputSource => '{
"type": "inline",
"data": "a,b,1\nc,d,2\n"
}',
inputFormat => '{
"type": "csv",
"columns": ["x","y","z"],
"listDelimiter": null,
"findColumnsFromHeader": false,
"skipHeaderRows": 0
}',
signature => '[
{"name": "x", "type": "STRING"},
{"name": "y", "type": "STRING"},
{"name": "z", "type": "LONG"}
]'
))
PARTITIONED BY ALL TIME
```
The above is great, but can be a bit awkward: we have to encode JSON in SQL
(which, when we send via the REST API, we encode again in JSON.) Let's how we
can use SQL named parameters to streamline the syntax (and set ourselves up for
the catalog.) SQL requires that parameter names be "simple identifiers": that
is, no dots. So, we can't just say:
```text
"inputSource.type" => "inline"
```
Instead, we have to "flatten" the names. That is, define SQL names that,
internally, we map the the JSON names. The mapping is just code, so we omit the
details here. Suppose we do the mapping. We now have a different set of
arguments, so we need a different function. For now, let's call it `staged`.
We also need a way to specify the input table schema. Here we borrow another
bit of Calcite functionality, the `EXTEND` clause which was added for Apache
Phoenix. We modify the syntax a bit to fit our needs. The result:
```sql
INSERT INTO dst SELECT *
FROM TABLE(staged(
source => 'inline',
data => 'a,b,1
c,d,2
',
format => 'csv'
))
(x VARCHAR NOT NULL, y VARCHAR NOT NULL, z BIGINT NOT NULL)
PARTITIONED BY ALL TIME
```
Notice how the keywords in the `staged` function arguments match the
`properties` defined in the REST call in the prior section. That is not an
accident. That sets us up to merge the two ideas in the next update.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]