[GitHub] [iceberg] kbendick commented on pull request #3561: [CORE] Specification for an HTTP REST catalog

GitBox Wed, 01 Dec 2021 20:40:30 -0800


kbendick commented on pull request #3561:
URL: https://github.com/apache/iceberg/pull/3561#issuecomment-984286915



   First, thank you for the extensive review @rymurr! I truly do appreciate it.
   
   > A few general comments from me:
   > 
   > 1. 412 feels weird. the fact that some people may have load balancers in 
front and some load balancers standard config aren't clear about what 404 means 
isn't a great reason for a non-standard overload of a return code.
   
   For reference, the primary load balancers I've worked with when encountering 
the want / desire to do this have been AWS provided. That's not meant as a dig 
on AWS ELBs or ALBs, but just to say that it's not a niche set of load 
balancers we're talking about. Amazon only advertise general 4xx and 5xx 
counts, but it's pretty common to have metrics from the load balancer or 
fronting servers (nginx, proxy layers) on specific codes out of the box and 
it's pretty common for people to parse specific codes from ELB logs via means 
that are typically outside of the normal observability stack (i.e. in-app / 
in-client metrics or reported events, if any).
   
   That's not necessarily reason to not use 404, but I did want to point out 
that it likely affects most all of us. It's by no means a legacy issue or only 
a concern for a small set of tools. It could just be a misconfigured server as 
well. Or misconfigured DNS. A number of things can cause application metrics to 
be missing (including metrics from any kind of mesh network - though you're 
probably not in a good spot regardless then).
   
   My other reason to avoid 404 is that it doesn't provide information about 
what was not present - in some cases, a namespace or a specific table might not 
be present from the same call and 404 for both requires additional tooling to 
monitor that, especially client side. Having a dedicated code for each is 
helpful there too. Using even specific codes in one response (409, 412, etc), 
can indicate more rich information and so I'm partial to that and having it 
documented vs a large number of 404s.
   
   But there does seem to be pushback to this idea, so I'm willing to convert 
to 404 if enough people feel strongly about it. I'm going to handle that later 
as there's more feedback to cover, but will revert if people feel strongly.
   
   > 2. Thinking about the API from a high level perspective (as opposed to a 
mapping of existing java interfaces) it seems we have two entities to consider: 
Catalog and Table. Catalog operations are mostly related to high level 
concerns: namespaces, listings. properties, broad config etc and typically 
don't require modification of iceberg metadata (therefore don't require 
locking). Whereas tables require locking, modification of the filesystem/object 
store etc.
   
   I don't necessarily agree with the idea that Table requires locking and 
Catalog doesn't, only because I don' think this new catalog should necessarily 
prescribe how things are done either (just as the API doesn't currently fully 
prescribe how a catalog can achieve certain things). Though I do see where 
you're coming from and it's `Table`s that cause `Transaction`. So I don't 
completely disagree by any means.
   
   Also, in some scenarios locking might be needed on the catalog, such as 
highly concurrent multi-tenant environments like Kyuubi or large Trino 
deploymens etc.
   
   I've avoided `Transaction` for now because of it in this PR. As well as 
table creation. Those can be tackled separately.
   
   > With this separation in mind (and the fact that we haven't specified in 
the API how to guarantee atomicity) I thought keeping the catalog api (eg 
`/v1/catalog`) to only deal w/ namespaces, properties etc may be a better 
separation. We could move the CRUD operations for tables to `/v1/table` paths 
and overload less the meaning of some of the operations here.
   > 
   > Eg:
   > 
   > 1. Post/Put/Delete/Get/Head on `v1/catalog/{namespace}` can refer to 
create, update properties, delete, get properties and check namespace existence
   > 2. Post/Put/Delete/Get/Head on `v1/table/{namespace}/{table}` can refer to 
create, update table, delete, get table and check table existence
   >    Then 2. only has to deal w the atmoic CRUD operations and we override 
much less.
   
   I'm not opposed to adding more separation on the routes, but admittedly I've 
moved a few of them back and forth a few times so I'll hold off momentarily and 
focus on cleaning up other parts (especially types) and providing more examples 
as well as working on the annotations for my own use in case we want to go that 
route. I'm actually finding it slightly more effective to just work on this 
document vs using the annotations, but the JSON representation of some items 
isn't out of the box without an additional library.
   
   Ryan covered some reasons to have additional string constants in the path, 
which I agree with.
   
   We'd need more routes than what you proposed still, such as removeProperties 
and setProperties (on a namespace or on a table, but two separate issues). And 
renameTable.
   
   For now this PR mostly aims to cover `catalog` related issues and 
non-transaction based concerns on tables (so simple gets, list, and presently 
the main drop interface - RD from CRUD on tables).
   
   Sorry for the long response, but thank you for your thorough review. I 
wanted to give people enough time to respond before working on it further given 
it became somewhat of a quick back and forth and then I was on vacation.
   
   Thanks again @rymurr!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on pull request #3561: [CORE] Specification for an HTTP REST catalog

Reply via email to