Howdy folks!

Thanks for joining the community meeting about the NoSQL presentation
yesterday. In this mail, I'd love to:
1. Detail the plan of moving forward with NoSQL
2. Gather more feedback on the work

*Moving Forward*
In terms of moving forward, I'll be opening up a serial series of PRs that
are based upon the initial implementation PR:
https://github.com/apache/polaris/pull/1189 . The goal is to break this
into smaller, cohesive PRs. Those PRs will only be about items that are not
actively in discussion. For example, there are some frameworks built in
that original PR which can be easily carved out and reviewed on their own
to deal with Snowflake ID generation. Like everything else, I'll be working
with folks on what we think is best for the community.

*Feedback*
In terms of feedback, I gathered a few items from the session. I'll answer
them here and, also, start an FAQ Doc
<https://docs.google.com/document/d/1NvZp9ro9FXvK_jkUlKSym03BhpVQU9gqIkGFC_M5bOg/edit?tab=t.0#heading=h.h7yhlew0hwvq>
where we can keep track of the frequently asked questions as this is a
large chunk of work and I expect that we might have the same question asked
a few times. :)

During our discussion, I noted 5 main pieces of feedback:

   1. Is there a bottleneck on the catalog content "named pointer" during
   commits?
   2. How are we handling caching with this approach?
   3. Are there any scenarios where we are going to be crossing "named
   pointers", but we need to be able to ensure consistency?
   4. In the initial implementation PR, there were a few modules
concerning some
   authorization stuff
   
<https://github.com/apache/polaris/pull/1189/files#diff-82794bfe7193249c378e723e9a4ca243212e18b195d353248b1c470fa9f89104>.
   Can you explain how this interfaces with the existing Polaris authorization
   system?
   5. Can we revisit the name of "Named Pointer"?


*#1 - Catalog Content Bottleneck**Question Details:* The catalog content
“Named Pointer” needs to be updated anytime there is a write to any catalog
content. This could be a bottleneck because the compare-and-swap (CAS) of
the “Named Pointer” will only succeed if the new commit ID is committed on
top of the commit that was the previous when the commit retry loop started.
If this fails to be the case, a commit will have to be partially rebuilt.
*Answer:* While it is true that the commit will have to be partially
rebuilt if the commit fails the CAS, Pierre Laporte has done extensive
scale testing to find that this does not limit high concurrency in practice
when testing the initial implementation PR. I'll work with Pierre to send
out the scale testing information to the team.

*#2 - Caching*
*Question Details:* There is some discussion about interfacing with a cache
in a layer above the persistence layer versus having the persistence layer
own the cache.
*Answer:* This model mostly relies on immutable objects which helps with
caching. This implementation does its own caching and does not necessarily
need an EntityCache at a higher-level of abstraction above the persistence
layer due to the object immutability.

*#3 - Cross "Named Pointer" Consistency*
*Question Details:* For example, when a user creates a catalog, the code
has to create a catalog role, grant a record associated with a principal
role, and a catalog. This crosses three separate “Named Pointers”. How
should we solve this?
*Answer:* If the code serializes the creation of the grant record, the
catalog role, and the catalog, it should be solved in practice as long as
there is an out-of-band clean-up to ensure that there is proper consistency.

*#4 - Authorization Items*
*Answer:* The initial implementation was done on March 17th. A lot has
changed since then around project maturity. At this point, Robert & I would
only bring in the necessary items to persist grants. If the privilege
checking mechanism implemented in the initial implementation is helpful in
the future, someone can file a different enhancement issue to incorporate
it.

*#5 - "Named Pointer" Name*
*Question Details:* The name of "Named Pointer" tells exactly what the
thing is but not necessarily how it functions in the codebase.
*Answer:* I would be amenable to changing this name. I could see several
other names:
1. Consistency Boundary
2. State Reference
3. State Pointer
4. Consistency Groupings
In my opinion, we could probably solve this during PRs, but I understand
that names are important and hard.

If you made it to the end of this mail, congrats! Let me know if I can help
answer any feedback!

Go team,

Adam Christian

Reply via email to