paul-rogers commented on PR #13165:
URL: https://github.com/apache/druid/pull/13165#issuecomment-1295634506
> One thing I think would be good is if in the URI path, when specifying a
table within a specific schema, the schema should come before table in the path
As it turns out, the schema does come before the table:
`POST /resource/tables/{schema}/{table}[?version={n}|overwrite=true|false]`
I suspect the confusing bit is the "/resource/tables". The thought here was
that, at present, the catalog has only tables. Most DB allow user-defined
schemas. We may want to add connections for things like S3, Kafka, etc. So, the
thought was we'd have a variety of "resources". Each with some naming
convention. So, in the future (not now):
`POST /resource/schemas/{schema}`
`POST /resource/connections/{conn}`
Etc.
Here, we could simplify: the `resource` part could be removed:
`POST /tables/{schema}/{table}`
`POST /schemas/{schema}`
`POST /connections/{conn}`
(Everything here is simplified, BTW, there is a common prefix which I'm
omitting.)
> Maybe the schema can be included in the request payload
That is one solution. To add, `POST /tables`. The problem is, the result is
asymmetric on get: `GET /tables` might return everything, so one would do `GET
/tables/{schema}/{table}`. Plus, the content would differ: to create we provide
a name, but to get, we don't need a response with the name because we already
have the name.
Then, there is the update ambiguity: `POST /tables/{schema}/{name}` says
which table we want to update. If the name also appears in the request, then we
(and the user) would have to ensure that they match. I suppose we could do
update as `POST /tables` with the name in the body...
A final comment is that the present design says that the name is the place
you store your table spec: it isn't an attribute of the spec. This means I can
post the same spec under multiple names: one for dev, another for test, and
another for prod. (Since Druid doesn't allow user-defined namespaces, the best
thing is "dev_events", "test_events" and "event" for dev, test and prod.) If
the name were in the spec, then the spec would have to be modified for each
use. (And, the DB record would store the name twice: once in the key field,
another in the spec, resulting in redundancy and another thing to verify on
every update.)
The existing, and proposed, designs allow the same spec format for create,
update and read. It allows the same spec to be posted to dev, test and prod
tables. I think we want to keep each of these features. That said, I'm open to
revisions about _how_ we provide those features.
> its a bit awkward that the table is specified as a resource in some apis,
and as a entry in others
The difference in "themes" was due to the `/schemas/{schema}/{table}`/
`schemas/{schema}/{operation}` ambiguity if we do the obvious solution and try
to use a common base for both. But, if we're OK with
`/schemas/{schema}/{operation}` and `/tables/{schema}/{table}/{operation}`,
then we can almost, but not quite, combine resources and entries.
For tables:
* `POST /tables/{schema}/{table}[?version={n}|overwrite=true|false]`
add/update a table
* `POST /tables/{schema}/{table}/edit` "edit" (incremental update) a table
* `GET /tables/{schema}/{table}` Get the table spec for a table (same object
a for create/update)
* `GET /tables/{schema}/{table}/metadata` Get the table metadata (name,
update date, state, spec, etc.)
* `GET /tables` Get metadata for all tables in all schemas
For schemas:
* `POST /schemas/{schema}` Create a schema (not yet supported!)
* `GET /schemas/{schema}` Get metadata for a schema (not yet supported!)
* `GET /schemas/{schema}/names` Get the names of tables within the schema
* `GET /schemas/{schema}/tables` Get the metadata for each table in the
schema
In the above, however, there is no good way to get the names of all tables
in all schemas: `GET /schemas/names` won't work (ambiguous). `GET /schemas`
won't work (would imply getting the metadata (contents) for all schemas.
This is the "trying to be too clever" issue that made the original API a bit
awkward: had to do some song and dance to work around ambiguities.
The proposal in the earlier message resolves these issues by saying _what
you want to do_, then saying, _what you want to do it on_. That way, to get
names:
* `GET /names/schemas` says to get all schema names
* `GET /names/tables` says to get all table names in all schemas
* `GET /names/schema/{schema}` says to get all table names in the given
schema
(The above is a refinement of the earlier proposal.)
The same pattern is then repeated for metadata with `entries`.
Again, I think we need the lists of names, and the lists of contents. We
need it for the whole system, for everything in a schema, and for a single
table. Again, I'm open to other ways of accomplishing the goals.
Ideas?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]