Hi all,

I read through the proposal and the comments. One framing that may help us
converge is to split the proposal into a few separate decisions instead of
reviewing it as one bundled “OpenLineage support in Polaris” feature.

This seems related to a broader direction I understand for Polaris as a
platform: it should be flexible enough to support different deployment and
integration use cases, but still battery-included enough to be useful out
of the box. For lineage, I think that means we should explicitly separate:
what Polaris promises as native lineage semantics, what the default battery
implementation does, and what should remain pluggable for richer or
deployment-specific implementations.

I have been using a similar exercise in a recent SPI proposal draft: first
separate external contracts, default/battery implementation, extension
implementations, and provider-facing replacement points; then decide
implementation. I think that exercise applies well here because this
proposal touches several different boundary types at once: ingest protocol,
Polaris-native lineage model, persistence, query API, downstream
forwarding, auth, and dataset resolution.

The questions I think we should separate are:

   1. *OpenLineage compatibility: *Do we require existing OpenLineage
   clients to emit to Polaris by changing only the endpoint/config?
      - If yes, then a server-side OpenLineage-compatible adapter endpoint
      makes sense.
      - If not, another option is a Polaris-provided OpenLineage
      transport/client shim that reshapes OpenLineage events into a
      Polaris-native lineage API.
   - Those are different adoption tradeoffs, and I think we should choose
      intentionally rather than letting OpenLineage compatibility implicitly
      define the Polaris-native API.
   2. *Polaris-native lineage model: *Should the long-term Polaris lineage
   model/query API be OpenLineage-specific, or framework-agnostic with
   OpenLineage as one adapter?
      - My preference is the latter. OpenLineage compatibility is useful,
      but I would avoid making the OpenLineage payload shape the Polaris-native
      lineage model by accident.
   3. *Default battery behavior: *What should work out of the box?
      - If query is part of the initial release, I think the battery needs
      enough local state to answer a minimal query. A narrow default could be:
      latest observed direct table-level upstreams for a Polaris-managed target
      table, with observed timestamp, producer/engine identifier, and upstream
      dataset refs.
   4. *Extension implementations: *What should be pluggable or future work?
      - I would put raw OpenLineage forwarding/proxying, external backend
      query, full graph history, multi-hop traversal, column-level
query, job/run
      graph, pruning/staleness, and richer governance-aware behavior into
      extension/future implementation areas rather than the default battery.

*One subtle point*: I do not think the default battery and the REST/API
envelope need to have exactly the same scope.

The default battery can be intentionally small. For example, latest direct
table-level lineage summary for Polaris-managed target tables. *But the
REST/API envelope can still be designed so that richer implementations are
possible later or through extensions*. For example, the API can carry
metadata such as *granularity (table/col/job etc.), format/source protocol
(OpenLineage or other lineage framework)*, or requested mode to help
Polaris route handling to the configured provider, without requiring every
default implementation to support every mode.

Said differently, I would separate:

   - what the API envelope can represent;
   - what the default battery actually guarantees;
   - what extension implementations can support.

*My concrete recommendation would be*:

If Polaris exposes a lineage Query API in the initial release, the default
battery should provide a minimal latest table-level summary implementation
so the query works out of the box. If we do not want any local persistence
in the initial release, then I think the Query API should be out of scope
for the initial release or clearly extension-provided. I would avoid
exposing a core query API whose default implementation cannot answer
anything.

*My preferred shape would be*:

   - Polaris-native lineage semantics stay *framework-agnostic*.
   - OpenLineage is supported as an adapter/adoption path, *not as the only
   Polaris lineage model*.
   - The default battery, if query is in scope, is latest direct
   table-level lineage summary only.
   - *The API envelope leaves room for richer provider implementations*.
   - Full OpenLineage backend behavior, downstream forwarding/proxying,
   historical graph, column lineage, job/run lineage, multi-hop query,
   pruning/staleness, and external backend query *are extension or future
   work*.

This would still give Polaris a useful out-of-the-box lineage experience,
while avoiding turning Polaris into a full lineage backend in the first
step.

-ej

On Mon, Jun 8, 2026 at 2:31 PM Adnan Hemani via dev <[email protected]>
wrote:

> Hi Robert,
>
> > Is my understanding correct that option 1 is out of scope from your
> perspective, and option 2 is not sufficient for the M0 you have in mind? In
> other words, you are proposing option 3 as the baseline, with active
> planning toward option 4?
>
> Yes, that's correct. Happy to hear others' opinions, but Option 4 has been
> detailed in the proposal document since the very start. I'm happy to wait a
> few more days for others' opinions, but as of now I don't see any active
> opposition to the plans as-is and the "lazy consensus" suggested deadline
> was over 2 weeks ago. I-Ting and I will start implementation in the
> meantime.
>
> Best,
> Adnan Hemani
>
> On Mon, Jun 8, 2026 at 3:19 AM Robert Stupp <[email protected]> wrote:
>
> > Hi all,
> >
> > Thanks Adnan, that helps clarify the shape.
> >
> > I think this is the point where broader community input would be useful,
> > because options 3/4 are a materially different commitment from options
> 1/2.
> >
> > Is my understanding correct that option 1 is out of scope from your
> > perspective, and option 2 is not sufficient for the M0 you have in mind?
> In
> > other words, you are proposing option 3 as the baseline, with active
> > planning toward option 4?
> >
> > Option 3 does not just put a proxy endpoint in Polaris.
> > It makes Polaris responsible for the OL ingest path: dataset-name
> > resolution, per-entity authZ over OL assertions, policy for non-Polaris
> > datasets, trusted-service credentials to downstream systems, request-size
> > and payload limits, forwarding failure semantics, audit behavior, and
> > tenant isolation.
> >
> > Option 4 then adds a Polaris-local lineage storage/query subsystem.
> > Even if the first version stores only a reduced projection, Polaris would
> > take on many responsibilities of an OL backend: persistence semantics,
> > query semantics, staleness/pruning, auth-filtered reads, backend
> > compatibility, migrations, limits, and long-term compatibility with OL
> > event shapes.
> > At that point, even if intentionally limited, Polaris effectively
> operates
> > as an OL backend for the supported subset.
> >
> > So before we treat option 3 plus active planning toward option 4 as the
> M0
> > baseline, I think it would be good to hear whether others agree that
> > Polaris should take on that implementation and maintenance surface for
> the
> > first milestone.
> >
> > Or whether we should start with a smaller integration point first.
> >
> > Robert
> >
>

Reply via email to