Hi Adnan, > the "/lineage" API is defined in the OpenLineage spec.
Could you provide a pointer to where this is defined on the OL side? Thanks, Dmitri. On Fri, Jun 12, 2026 at 4:33 PM Adnan Hemani via dev <[email protected]> wrote: > Hi EJ, > > Unfortunately, the "/lineage" API is defined in the OpenLineage spec. > Changing this out for Polaris would require client-side changes - leading > us to the same situation that you confirmed after your investigation. > > I agree with tackling the implementation similarly to what you've outlined. > However, breaking this design into those topics may create more chaos than > good because all these topics must work hand-in-hand design-wise and no > other non-OpenLineage proposals for Data Lineage are expected in the near > future. I request everyone to please review the initial PR that sets the > Ingest API in Polaris: https://github.com/apache/polaris/pull/4667. > > Best, > Adnan Hemani > > > > On Fri, Jun 12, 2026 at 12:02 PM EJ Wang <[email protected]> > wrote: > > > Hi Adnan, > > > > I think your point about adoption is right, and I'd revise part of my > > earlier framing after looking more closely at how existing OpenLineage > > integrations work. > > > > I was previously thinking too much about whether clients could emit a > more > > Polaris-native or framework-agnostic payload. But that is probably not > the > > right first-slice adoption model. Existing OL producers generally already > > emit OpenLineage events, and the common low-friction knob is the > transport > > target, URL/endpoint, not a uniform way to wrap or reshape the event > body. > > > > So I agree that the first slice should optimize for endpoint retargeting > > and raw OL event ingestion. Clients should not need to know that the > > backend is Polaris or learn a Polaris-specific payload shape. > > > > *The design question I'd still like us to make explicit is where the > > OpenLineage specificity lives*. My preference would be to make it > > explicit at the ingress/API layer, for example with an > OpenLineage-specific > > route under the lineage namespace such as: /.../lineage/openlineage > > > > That still preserves endpoint-retargeting for existing OL producers, > while > > avoiding ambiguity about whether the generic `/lineage` namespace is an > > OpenLineage contract or a broader Polaris lineage namespace. It also > leaves > > room for future `/lineage/<format>` ingress adapters if Polaris later > > supports other lineage formats or frameworks. > > > > Behind that ingress route, I'd like to keep the platform boundary > > Polaris-owned. I would separate: > > > > 1. *OpenLineage REST ingress/API* : an OL-aware endpoint that accepts raw > > OL events. > > 2. *Polaris lineage capability boundary*: a Polaris-owned contract behind > > ingress. > > 3. *Default/OOTB implementation:* a small bundled implementation that > > proves the SPI capability (encapsulate correctly and expose sufficiently > > for extension impls) works end-to-end, > > 4. *Extension implementations*: richer provider/proxy/forwarder/custom > > behavior for deployments that need it. > > > > This is not meant to reduce OpenLineage support. Quite the opposite: > > OpenLineage can be the first explicit supported ingress format. The point > > is to make the specificity explicit where it belongs, so Polaris can > > support OpenLineage well now while preserving room for future > contributions > > in the right layer. > > > > *With that framing, I'd suggest*: > > - Initial PR: OpenLineage-specific ingress + Polaris lineage capability > > boundary + minimal default/OOTB path. > > - Follow-up PRs: proxy/forwarder/custom provider implementations and > > richer behavior. > > - Query/persistence semantics: separate unless this proposal is > explicitly > > adding a read/query API. > > > > I think that would support the adoption goal you described, while keeping > > Polaris extensible in an organized way. > > > > -ej > > > > On Thu, Jun 11, 2026 at 8:19 PM Adnan Hemani <[email protected] > > > > wrote: > > > >> Hi EJ, > >> > >> Thanks for looking at the proposal. I've responded to most of your > >> comments on the document itself, but I'll summarize the stances here to > >> close the loop. > >> > >> I am consciously making an effort to let the OpenLineage standard drive > >> the requirements here; this is a feature, not a bug. IMO, OpenLineage is > >> by-far the most well-used standard for data lineage; I don't even know > of > >> any other significant competitors. Big Data engines like Spark and > Trino, > >> which represent a significant use case for Polaris, have OpenLineage > >> integrations and nothing else. Going the extra mile for further > flexibility > >> to de-couple our lineage implementations from OpenLineage will likely > not > >> produce any ROI in terms of work IMO. Happy to hear any other thoughts > on > >> this topic. > >> > >> I also don't agree that Polaris should morph into a full-fledged > >> OpenLineage server. I don't think the Polaris community is attempting to > >> make a "Swiss-Army Knife" tool out of Polaris. For major lineage use > cases, > >> users absolutely should be redirected to other servers like Marquez > where > >> they can get full graph history, multi-hop traversal, jobs/runs info, > etc. > >> I disagree with the "extensions" piece of your email based on this > >> reasoning. > >> > >> Regarding the "out-of-the-box" experience, I have no doubt: Polaris > >> cannot have lineage information. An admin must take a small step to > >> configure how they want to enable Lineage data persistence: either for > >> Polaris-local persistence or for the passthrough/proxy/AuthZ layer > modes. I > >> think you've missed some of the points in the mailing thread replies > above; > >> the Query API is really only helpful when using the Polaris local > >> persistence mode. The current plan is to build toward "passthrough" mode > >> first, with plans to support the Polaris local implementation soon > >> afterward. A Query API won't be introduced until the Polaris local > >> implementation work begins. This means there's no implication that a > Query > >> API will exist without returning data to the user. You can see this in > my > >> first PR, where only the Ingest API is implemented: > >> https://github.com/apache/polaris/pull/4667. > >> > >> One last note/suggestion for you: the term "default battery" on its own > >> generally doesn't make much sense. I'm only able to piece together your > >> comments because you used the phrase "batteries included" in this > morning's > >> community sync. I would usually use "out-of-the-box (OOTB)" or "default > >> implementation". Using similar terms in the future would improve > >> readability in general. > >> > >> Best, > >> Adnan Hemani > >> > >> On Thu, Jun 11, 2026 at 4:12 PM EJ Wang <[email protected] > > > >> wrote: > >> > >>> Hi all, > >>> > >>> I read through the proposal and the comments. One framing that may help > >>> us converge is to split the proposal into a few separate decisions > instead > >>> of reviewing it as one bundled “OpenLineage support in Polaris” > feature. > >>> > >>> This seems related to a broader direction I understand for Polaris as a > >>> platform: it should be flexible enough to support different deployment > and > >>> integration use cases, but still battery-included enough to be useful > out > >>> of the box. For lineage, I think that means we should explicitly > separate: > >>> what Polaris promises as native lineage semantics, what the default > battery > >>> implementation does, and what should remain pluggable for richer or > >>> deployment-specific implementations. > >>> > >>> I have been using a similar exercise in a recent SPI proposal draft: > >>> first separate external contracts, default/battery implementation, > >>> extension implementations, and provider-facing replacement points; then > >>> decide implementation. I think that exercise applies well here because > this > >>> proposal touches several different boundary types at once: ingest > protocol, > >>> Polaris-native lineage model, persistence, query API, downstream > >>> forwarding, auth, and dataset resolution. > >>> > >>> The questions I think we should separate are: > >>> > >>> 1. *OpenLineage compatibility: *Do we require existing OpenLineage > >>> clients to emit to Polaris by changing only the endpoint/config? > >>> - If yes, then a server-side OpenLineage-compatible adapter > >>> endpoint makes sense. > >>> - If not, another option is a Polaris-provided OpenLineage > >>> transport/client shim that reshapes OpenLineage events into a > >>> Polaris-native lineage API. > >>> - Those are different adoption tradeoffs, and I think we should > >>> choose intentionally rather than letting OpenLineage > compatibility > >>> implicitly define the Polaris-native API. > >>> 2. *Polaris-native lineage model: *Should the long-term Polaris > >>> lineage model/query API be OpenLineage-specific, or > framework-agnostic with > >>> OpenLineage as one adapter? > >>> - My preference is the latter. OpenLineage compatibility is > >>> useful, but I would avoid making the OpenLineage payload shape > the > >>> Polaris-native lineage model by accident. > >>> 3. *Default battery behavior: *What should work out of the box? > >>> - If query is part of the initial release, I think the battery > >>> needs enough local state to answer a minimal query. A narrow > default could > >>> be: latest observed direct table-level upstreams for a > Polaris-managed > >>> target table, with observed timestamp, producer/engine > identifier, and > >>> upstream dataset refs. > >>> 4. *Extension implementations: *What should be pluggable or future > >>> work? > >>> - I would put raw OpenLineage forwarding/proxying, external > >>> backend query, full graph history, multi-hop traversal, > column-level query, > >>> job/run graph, pruning/staleness, and richer governance-aware > behavior into > >>> extension/future implementation areas rather than the default > battery. > >>> > >>> *One subtle point*: I do not think the default battery and the REST/API > >>> envelope need to have exactly the same scope. > >>> > >>> The default battery can be intentionally small. For example, latest > >>> direct table-level lineage summary for Polaris-managed target tables. > *But > >>> the REST/API envelope can still be designed so that richer > implementations > >>> are possible later or through extensions*. For example, the API can > >>> carry metadata such as *granularity (table/col/job etc.), format/source > >>> protocol (OpenLineage or other lineage framework)*, or requested mode > >>> to help Polaris route handling to the configured provider, without > >>> requiring every default implementation to support every mode. > >>> > >>> Said differently, I would separate: > >>> > >>> - what the API envelope can represent; > >>> - what the default battery actually guarantees; > >>> - what extension implementations can support. > >>> > >>> *My concrete recommendation would be*: > >>> > >>> If Polaris exposes a lineage Query API in the initial release, the > >>> default battery should provide a minimal latest table-level summary > >>> implementation so the query works out of the box. If we do not want any > >>> local persistence in the initial release, then I think the Query API > should > >>> be out of scope for the initial release or clearly extension-provided. > I > >>> would avoid exposing a core query API whose default implementation > cannot > >>> answer anything. > >>> > >>> *My preferred shape would be*: > >>> > >>> - Polaris-native lineage semantics stay *framework-agnostic*. > >>> - OpenLineage is supported as an adapter/adoption path, *not as the > >>> only Polaris lineage model*. > >>> - The default battery, if query is in scope, is latest direct > >>> table-level lineage summary only. > >>> - *The API envelope leaves room for richer provider > implementations*. > >>> - Full OpenLineage backend behavior, downstream forwarding/proxying, > >>> historical graph, column lineage, job/run lineage, multi-hop query, > >>> pruning/staleness, and external backend query *are extension or > >>> future work*. > >>> > >>> This would still give Polaris a useful out-of-the-box lineage > >>> experience, while avoiding turning Polaris into a full lineage backend > in > >>> the first step. > >>> > >>> -ej > >>> > >>> On Mon, Jun 8, 2026 at 2:31 PM Adnan Hemani via dev < > >>> [email protected]> wrote: > >>> > >>>> Hi Robert, > >>>> > >>>> > Is my understanding correct that option 1 is out of scope from your > >>>> perspective, and option 2 is not sufficient for the M0 you have in > >>>> mind? In > >>>> other words, you are proposing option 3 as the baseline, with active > >>>> planning toward option 4? > >>>> > >>>> Yes, that's correct. Happy to hear others' opinions, but Option 4 has > >>>> been > >>>> detailed in the proposal document since the very start. I'm happy to > >>>> wait a > >>>> few more days for others' opinions, but as of now I don't see any > active > >>>> opposition to the plans as-is and the "lazy consensus" suggested > >>>> deadline > >>>> was over 2 weeks ago. I-Ting and I will start implementation in the > >>>> meantime. > >>>> > >>>> Best, > >>>> Adnan Hemani > >>>> > >>>> On Mon, Jun 8, 2026 at 3:19 AM Robert Stupp <[email protected]> wrote: > >>>> > >>>> > Hi all, > >>>> > > >>>> > Thanks Adnan, that helps clarify the shape. > >>>> > > >>>> > I think this is the point where broader community input would be > >>>> useful, > >>>> > because options 3/4 are a materially different commitment from > >>>> options 1/2. > >>>> > > >>>> > Is my understanding correct that option 1 is out of scope from your > >>>> > perspective, and option 2 is not sufficient for the M0 you have in > >>>> mind? In > >>>> > other words, you are proposing option 3 as the baseline, with active > >>>> > planning toward option 4? > >>>> > > >>>> > Option 3 does not just put a proxy endpoint in Polaris. > >>>> > It makes Polaris responsible for the OL ingest path: dataset-name > >>>> > resolution, per-entity authZ over OL assertions, policy for > >>>> non-Polaris > >>>> > datasets, trusted-service credentials to downstream systems, > >>>> request-size > >>>> > and payload limits, forwarding failure semantics, audit behavior, > and > >>>> > tenant isolation. > >>>> > > >>>> > Option 4 then adds a Polaris-local lineage storage/query subsystem. > >>>> > Even if the first version stores only a reduced projection, Polaris > >>>> would > >>>> > take on many responsibilities of an OL backend: persistence > semantics, > >>>> > query semantics, staleness/pruning, auth-filtered reads, backend > >>>> > compatibility, migrations, limits, and long-term compatibility with > OL > >>>> > event shapes. > >>>> > At that point, even if intentionally limited, Polaris effectively > >>>> operates > >>>> > as an OL backend for the supported subset. > >>>> > > >>>> > So before we treat option 3 plus active planning toward option 4 as > >>>> the M0 > >>>> > baseline, I think it would be good to hear whether others agree that > >>>> > Polaris should take on that implementation and maintenance surface > >>>> for the > >>>> > first milestone. > >>>> > > >>>> > Or whether we should start with a smaller integration point first. > >>>> > > >>>> > Robert > >>>> > > >>>> > >>> > -- Dmitri Bourlatchkov Senior Staff Software Engineer, Dremio Dremio.com <https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature> / Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get Started <https://www.dremio.com/get-started/> The Agentic Lakehouse The only lakehouse built for agents, managed by agents
