dennishuo commented on code in PR #724:
URL: https://github.com/apache/polaris/pull/724#discussion_r1920924907
##########
service/common/src/main/java/org/apache/polaris/service/catalog/BasePolarisCatalog.java:
##########
@@ -1627,6 +1627,12 @@ private FileIO refreshIOWithCredentials(
// the credentials should always override table-level properties, since
// storage configuration will be found at whatever entity defines it
tableProperties.putAll(credentialsMap);
+
+ // Populate the internal properties to FileIO in case the FileIO
implementation needs them
+ Map<String, String> internalProperties =
+
storageInfoEntity.map(PolarisEntity::getInternalPropertiesAsMap).orElse(Map.of());
Review Comment:
We could probably benefit from some terminology to distinguish between:
1. End-user who calls Polaris REST APIs
2. "Polaris codebase/project User" who is an infrastructure admin who takes
apache/polaris and deploys it into a production environment for other
"end-users" to interact with
2a. "Default" Polaris users who usually just take a jarfile from maven
central and run it with minimal customizations to configuration
2b. "Customized" Polaris users who mix in lots of proprietary plugins
and may have a fork of the core Polaris code
If I understand correctly, the main concern is about how the Apache Polaris
project itself can ensure "project users" (2) can be supported as the code
evolves.
@XJDKC 's point seems relevant though that at the lowest common denominator,
there's already a property bag that can be set by the end-users (1) that can
already plumb through behavioral changes either in the base Iceberg FileIO
directly or by interpretation from the FileIOFactory, regardless of this PR.
Whether we should make a change to lock that down more and/or enumerate
"supported" properties could be worth a separate discussion outside of this PR.
This intersects with concerns about refactors of Polaris internals because
in theory (2b) users could privately come to rely on injection of
custom-meaning properties into that public `properties` map as much as they
could do so on `internalProperties`.
As I mentioned before, IMO passing in the `resolvedStorageEntity` doesn't
preclude adding more rigid "structure" to what is being used in the factory and
in particular it needn't encourage any direct use of `internalProperties`
either.
In particular, we can still make the `DefaultFileIOFactory` effectively
enumerate the officially supported set of configuration that we'll be careful
as a project to ensure carries over from 1.0 to 2.0, etc. The "Default" polaris
service-runners (2a) will only be using such out-of-the-box factories, so we'll
be able to know exactly what we're allowed to evolve without breaking (2a).
Those in the (2b) category are explicitly signing up to do their own legwork
*if* they both add plumbing of custom internal properties and use those custom
properties in a custom FileIOFactory, and I think what we're saying here is
that this extensibility mechanism is an explicit design choice to allow such
advanced use cases.
I suppose alternatively, as a project we could assert that we want to avoid
encouraging such extensibility mechanisms, and that a persistent code-fork is
the preferred way to customize FileIO construction in such ways instead.
Both approaches kind of say the same thing -- "deep customizations mean you
have to do your own homework on major-version backwards compatibility", and I
guess requiring a code fork is a bit more explicit, but I guess my main point
is that we can still convey that delineation of a backwards-compatibility
contract despite allowing an "easier" extensibility mechanism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]