Howdy folks! Thanks for joining the community meeting about the NoSQL presentation yesterday. In this mail, I'd love to: 1. Detail the plan of moving forward with NoSQL 2. Gather more feedback on the work
*Moving Forward* In terms of moving forward, I'll be opening up a serial series of PRs that are based upon the initial implementation PR: https://github.com/apache/polaris/pull/1189 . The goal is to break this into smaller, cohesive PRs. Those PRs will only be about items that are not actively in discussion. For example, there are some frameworks built in that original PR which can be easily carved out and reviewed on their own to deal with Snowflake ID generation. Like everything else, I'll be working with folks on what we think is best for the community. *Feedback* In terms of feedback, I gathered a few items from the session. I'll answer them here and, also, start an FAQ Doc <https://docs.google.com/document/d/1NvZp9ro9FXvK_jkUlKSym03BhpVQU9gqIkGFC_M5bOg/edit?tab=t.0#heading=h.h7yhlew0hwvq> where we can keep track of the frequently asked questions as this is a large chunk of work and I expect that we might have the same question asked a few times. :) During our discussion, I noted 5 main pieces of feedback: 1. Is there a bottleneck on the catalog content "named pointer" during commits? 2. How are we handling caching with this approach? 3. Are there any scenarios where we are going to be crossing "named pointers", but we need to be able to ensure consistency? 4. In the initial implementation PR, there were a few modules concerning some authorization stuff <https://github.com/apache/polaris/pull/1189/files#diff-82794bfe7193249c378e723e9a4ca243212e18b195d353248b1c470fa9f89104>. Can you explain how this interfaces with the existing Polaris authorization system? 5. Can we revisit the name of "Named Pointer"? *#1 - Catalog Content Bottleneck**Question Details:* The catalog content “Named Pointer” needs to be updated anytime there is a write to any catalog content. This could be a bottleneck because the compare-and-swap (CAS) of the “Named Pointer” will only succeed if the new commit ID is committed on top of the commit that was the previous when the commit retry loop started. If this fails to be the case, a commit will have to be partially rebuilt. *Answer:* While it is true that the commit will have to be partially rebuilt if the commit fails the CAS, Pierre Laporte has done extensive scale testing to find that this does not limit high concurrency in practice when testing the initial implementation PR. I'll work with Pierre to send out the scale testing information to the team. *#2 - Caching* *Question Details:* There is some discussion about interfacing with a cache in a layer above the persistence layer versus having the persistence layer own the cache. *Answer:* This model mostly relies on immutable objects which helps with caching. This implementation does its own caching and does not necessarily need an EntityCache at a higher-level of abstraction above the persistence layer due to the object immutability. *#3 - Cross "Named Pointer" Consistency* *Question Details:* For example, when a user creates a catalog, the code has to create a catalog role, grant a record associated with a principal role, and a catalog. This crosses three separate “Named Pointers”. How should we solve this? *Answer:* If the code serializes the creation of the grant record, the catalog role, and the catalog, it should be solved in practice as long as there is an out-of-band clean-up to ensure that there is proper consistency. *#4 - Authorization Items* *Answer:* The initial implementation was done on March 17th. A lot has changed since then around project maturity. At this point, Robert & I would only bring in the necessary items to persist grants. If the privilege checking mechanism implemented in the initial implementation is helpful in the future, someone can file a different enhancement issue to incorporate it. *#5 - "Named Pointer" Name* *Question Details:* The name of "Named Pointer" tells exactly what the thing is but not necessarily how it functions in the codebase. *Answer:* I would be amenable to changing this name. I could see several other names: 1. Consistency Boundary 2. State Reference 3. State Pointer 4. Consistency Groupings In my opinion, we could probably solve this during PRs, but I understand that names are important and hard. If you made it to the end of this mail, congrats! Let me know if I can help answer any feedback! Go team, Adam Christian