All,

We (Workday) currently uptake Polaris as Maven dependencies and add 
functionality on top of the core libraries. It started as authentication 
enhancements, but now I have custom listeners for events, custom listeners for 
metrics persistence, a SCIM interface, and there is an ask from a data partner 
to expose semantic information via OSI.  Also, till the Iceberg community sorts 
out FGAC and / or metadata labels, there are asks to share some form of hints 
for enforcing authorization on the data partners' side. Our current use-cases 
are all federated data where our data partners (Salesforce, Snowflake, 
Databricks, Google and Amazon) federate Workday data through their catalog 
services.

Our timelines are very aggressive, so I have added the functionality that I 
need on our custom build, then turn around and contribute it to the Polaris OSS 
and when we uptake new features, delete the code I added. Thankfully, Quarkus 
makes this relatively easy. There are some places where I can not extend easily 
and need it to be done in the Polaris OSS first (AWS session tags was one such 
feature).

I want to share a concrete data point from recent work on the metrics reporting 
feature (PR #4115 / #4756) that illustrates why each of these matters.

We initially bundled the reporting SPI, the default no-op/logging 
implementations, the REST query service, the OpenAPI spec, auth privilege 
additions, and JDBC persistence all in a single PR. The review rightly called 
it out as too broad. Splitting it required two rounds of significant rework: 
extracting the SPI into its own module (extensions/metrics-reports/spi), moving 
the durable query path and REST service into a follow-up PR, and re-wiring CDI 
producers and downstream builds multiple times as the shape changed.

A few specifics that map directly to Dmitri's points:

- Isolated Gradle modules for REST API code (point 1): Putting the metrics 
query REST service (api/metrics-reports-service) in its own module meant the 
core runtime doesn't have to expose those endpoints. Downstream builds that 
don't want the query API simply omit that dependency. We learned this the hard 
way after bundling it into runtime/service initially.
- Feature-specific Persistence SPIs (point 5): MetricsPersistence and 
IcebergMetricsReporter both needed their own SPI layer so downstream 
implementations (JDBC, no-op, custom) can be swapped without touching 
polaris-core. Getting this layering right was the bulk of the review churn.
- Separate PRs for entity changes (point 4): The SQL schema additions for 
metrics tables are in PR2 specifically because mixing schema migrations with 
SPI/API changes, reviewers couldn't easily assess the surface area of each 
concern.

The discipline Dmitri is proposing would have saved us multiple rebase cycles 
and reviewer back-and-forth.

—
anand

From: Dmitri Bourlatchkov <[email protected]>
Date: Wednesday, June 24, 2026 at 9:43 AM
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] Modular design for new features

This Message Is From an External Sender
This message came from outside your organization.
Report 
Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZAGr2cumY1yLOWMt3qBhAE0f6q0IIB7I3WbI-PfK7oi8_Xzj6bOk39QvB5F7ynZ0mNWbr6KptyL4uYInsNJfnl_Eaf4R_c1kMmyaKH_hCt9dIdpgQuj98mWrMS2Y$>


HI Russell,

I'm less convinced we need a blanket policy of isolated Gradle modules,
feature-specific SPIs, and staged entity-then-REST PRs for every new
proposal.


Fair enough. That might indeed have been overkill in my initial email.

I think it should be fine to combine Entity changes with REST API changes
in the same PR as long as the PR remains reasonably small for ease of
review.

My concern was mainly driven by the reviewer's perspective, since
validating feature boundaries is harder when entity changes are
interspersed with REST changes.

Most deployments run the standard server release anyway; [...]


I cannot agree with that. I believe we have several OSS users with custom
Polaris-based builds. I'm pretty sure all "intergators" have custom builds
too. This cannot be decided based on undifferentiated deployment counts
alone. We need to consider the usability of the project for downstream
builds.

The problem with bundling all new REST APIs into `runtime/service` is that
it _forces_ all downstream builds to expose the new endpoints.

I think the inconvenience of dealing with multiple source modules is rather
minor in modern IDEs, while the benefit for downstream project flexibility
is clear.

I wonder if Anand could share his contributor experience based on [4115].

[4115] 
https://urldefense.com/v3/__https://github.com/apache/polaris/pull/4115__;!!Iz9xO38YGHZK!6uV890MsdBYLWklaXHA3BkjgieXFNpSHU7qwX6QObHmvU31sQaVFT-4ZCkpnmffb_CH6mgTHwpjV6Rf6rw$

Cheers,
Dmitri.

On Wed, Jun 24, 2026 at 12:02 PM Russell Spitzer <[email protected]>
wrote:

> Thanks for raising this, Dmitri. I agree we should be careful about
> coupling new features into core and runtime in ways that are hard to
> unwind, and I'm on board with feature flags and keeping optional behavior
> off default paths where that makes sense.
>
> I'm less convinced we need a blanket policy of isolated Gradle modules,
> feature-specific SPIs, and staged entity-then-REST PRs for every new
> proposal. A big reason is contributor experience: most of the existing
> server code lives in a small number of well-understood modules
> (polaris-core, polaris-runtime-service, the generated API jars). Someone
> proposing scan metrics or a semantic layer API can find similar REST
> handlers, persistence patterns, and tests without first learning a module
> taxonomy or deciding which of several new jars their change belongs in.
>
> I'd rather optimize for single-module (or few-module) contributions by
> default: add the endpoint, service logic, and tests alongside existing
> similar code. I can understand bundling sets of functionality together like
> if we did want to separate Polaris specific from Iceberg specific modules
> but I think the codebase is actually more usable with fewer modules rather
> than more. If a feature later proves it needs independent deployment, a
> pluggable backend, or a separate schema lifecycle, we can extract it then.
>
> Most deployments run the standard server release anyway; downstream custom
> assembly is possible but doesn't seem to be a widespread pattern today.
> Given that, I'd support lighter guidelines:
>
> 1. Don't entangle new features into core call paths unnecessarily
> 2. Use flags for optional capabilities
> 3. Default to adding code where contributors already look
> 4. Split into a separate module only when there's a demonstrated need
>
> Happy to discuss where that line should be.
>
> On Wed, Jun 24, 2026 at 9:06 AM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi All,
> >
> > Polaris has been getting many new and interesting proposals lately. This
> is
> > certainly good for the project.
> >
> > On the other hand, we need to think about the stability and usability of
> > the system as new features are introduced.
> >
> > Polaris is currently used in two modes: a) as a ready-made server for the
> > default set of features (source or binary releases) and/or b) as a basis
> > for custom downstream builds (from Maven artifacts).
> >
> > I'd like to propose the following general principles, which I hope will
> > allow quick feature development without adding risks to either of the
> usage
> > avenues.
> >
> > 1) Put code for new REST API services in isolated Gradle modules.
> >
> > 2) Wire those services into the runtime/server explicitly, behind feature
> > flags where appropriate.
> >
> > 3) Do not add hard dependencies from runtime/service or polaris-core to
> > REST API modules.
> >
> > 4) If a feature requires new Polaris entity types, add those core model
> > changes in a dedicated PR so the entity and persistence contract can be
> > reviewed on its own. New REST service modules can depend on those core
> > entities, but existing core call paths should not depend on
> > feature-specific entities.
> >
> > 5) Add new Persistence SPI(s) for non-entity storage (e.g. Scan/Commit
> > Metrics). Keep new SPI classes in feature-specific Gradle modules.
> >
> > 6) Use isolated SQL schema definition files for each feature involving
> > non-entity JDBC persistence.
> >
> > That is: separate .sql files for Metrics, Events, etc.
> >
> > Thoughts?
> >
> > Cheers,
> > Dmitri.
> >
>

Reply via email to