Hi EJ,

Unfortunately, the "/lineage" API is defined in the OpenLineage spec.
Changing this out for Polaris would require client-side changes - leading
us to the same situation that you confirmed after your investigation.

I agree with tackling the implementation similarly to what you've outlined.
However, breaking this design into those topics may create more chaos than
good because all these topics must work hand-in-hand design-wise and no
other non-OpenLineage proposals for Data Lineage are expected in the near
future. I request everyone to please review the initial PR that sets the
Ingest API in Polaris: https://github.com/apache/polaris/pull/4667.

Best,
Adnan Hemani



On Fri, Jun 12, 2026 at 12:02 PM EJ Wang <[email protected]>
wrote:

> Hi Adnan,
>
> I think your point about adoption is right, and I'd revise part of my
> earlier framing after looking more closely at how existing OpenLineage
> integrations work.
>
> I was previously thinking too much about whether clients could emit a more
> Polaris-native or framework-agnostic payload. But that is probably not the
> right first-slice adoption model. Existing OL producers generally already
> emit OpenLineage events, and the common low-friction knob is the transport
> target, URL/endpoint, not a uniform way to wrap or reshape the event body.
>
> So I agree that the first slice should optimize for endpoint retargeting
> and raw OL event ingestion. Clients should not need to know that the
> backend is Polaris or learn a Polaris-specific payload shape.
>
> *The design question I'd still like us to make explicit is where the
> OpenLineage specificity lives*. My preference would be to make it
> explicit at the ingress/API layer, for example with an OpenLineage-specific
> route under the lineage namespace such as: /.../lineage/openlineage
>
> That still preserves endpoint-retargeting for existing OL producers, while
> avoiding ambiguity about whether the generic `/lineage` namespace is an
> OpenLineage contract or a broader Polaris lineage namespace. It also leaves
> room for future `/lineage/<format>` ingress adapters if Polaris later
> supports other lineage formats or frameworks.
>
> Behind that ingress route, I'd like to keep the platform boundary
> Polaris-owned. I would separate:
>
> 1. *OpenLineage REST ingress/API* : an OL-aware endpoint that accepts raw
> OL events.
> 2. *Polaris lineage capability boundary*: a Polaris-owned contract behind
> ingress.
> 3. *Default/OOTB implementation:* a small bundled implementation that
> proves the SPI capability (encapsulate correctly and expose sufficiently
> for extension impls) works end-to-end,
> 4. *Extension implementations*: richer provider/proxy/forwarder/custom
> behavior for deployments that need it.
>
> This is not meant to reduce OpenLineage support. Quite the opposite:
> OpenLineage can be the first explicit supported ingress format. The point
> is to make the specificity explicit where it belongs, so Polaris can
> support OpenLineage well now while preserving room for future contributions
> in the right layer.
>
> *With that framing, I'd suggest*:
> - Initial PR: OpenLineage-specific ingress + Polaris lineage capability
> boundary + minimal default/OOTB path.
> - Follow-up PRs: proxy/forwarder/custom provider implementations and
> richer behavior.
> - Query/persistence semantics: separate unless this proposal is explicitly
> adding a read/query API.
>
> I think that would support the adoption goal you described, while keeping
> Polaris extensible in an organized way.
>
> -ej
>
> On Thu, Jun 11, 2026 at 8:19 PM Adnan Hemani <[email protected]>
> wrote:
>
>> Hi EJ,
>>
>> Thanks for looking at the proposal. I've responded to most of your
>> comments on the document itself, but I'll summarize the stances here to
>> close the loop.
>>
>> I am consciously making an effort to let the OpenLineage standard drive
>> the requirements here; this is a feature, not a bug. IMO, OpenLineage is
>> by-far the most well-used standard for data lineage; I don't even know of
>> any other significant competitors. Big Data engines like Spark and Trino,
>> which represent a significant use case for Polaris, have OpenLineage
>> integrations and nothing else. Going the extra mile for further flexibility
>> to de-couple our lineage implementations from OpenLineage will likely not
>> produce any ROI in terms of work IMO. Happy to hear any other thoughts on
>> this topic.
>>
>> I also don't agree that Polaris should morph into a full-fledged
>> OpenLineage server. I don't think the Polaris community is attempting to
>> make a "Swiss-Army Knife" tool out of Polaris. For major lineage use cases,
>> users absolutely should be redirected to other servers like Marquez where
>> they can get full graph history, multi-hop traversal, jobs/runs info, etc.
>> I disagree with the "extensions" piece of your email based on this
>> reasoning.
>>
>> Regarding the "out-of-the-box" experience, I have no doubt: Polaris
>> cannot have lineage information. An admin must take a small step to
>> configure how they want to enable Lineage data persistence: either for
>> Polaris-local persistence or for the passthrough/proxy/AuthZ layer modes. I
>> think you've missed some of the points in the mailing thread replies above;
>> the Query API is really only helpful when using the Polaris local
>> persistence mode. The current plan is to build toward "passthrough" mode
>> first, with plans to support the Polaris local implementation soon
>> afterward. A Query API won't be introduced until the Polaris local
>> implementation work begins. This means there's no implication that a Query
>> API will exist without returning data to the user. You can see this in my
>> first PR, where only the Ingest API is implemented:
>> https://github.com/apache/polaris/pull/4667.
>>
>> One last note/suggestion for you: the term "default battery" on its own
>> generally doesn't make much sense. I'm only able to piece together your
>> comments because you used the phrase "batteries included" in this morning's
>> community sync. I would usually use "out-of-the-box (OOTB)" or "default
>> implementation". Using similar terms in the future would improve
>> readability in general.
>>
>> Best,
>> Adnan Hemani
>>
>> On Thu, Jun 11, 2026 at 4:12 PM EJ Wang <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I read through the proposal and the comments. One framing that may help
>>> us converge is to split the proposal into a few separate decisions instead
>>> of reviewing it as one bundled “OpenLineage support in Polaris” feature.
>>>
>>> This seems related to a broader direction I understand for Polaris as a
>>> platform: it should be flexible enough to support different deployment and
>>> integration use cases, but still battery-included enough to be useful out
>>> of the box. For lineage, I think that means we should explicitly separate:
>>> what Polaris promises as native lineage semantics, what the default battery
>>> implementation does, and what should remain pluggable for richer or
>>> deployment-specific implementations.
>>>
>>> I have been using a similar exercise in a recent SPI proposal draft:
>>> first separate external contracts, default/battery implementation,
>>> extension implementations, and provider-facing replacement points; then
>>> decide implementation. I think that exercise applies well here because this
>>> proposal touches several different boundary types at once: ingest protocol,
>>> Polaris-native lineage model, persistence, query API, downstream
>>> forwarding, auth, and dataset resolution.
>>>
>>> The questions I think we should separate are:
>>>
>>>    1. *OpenLineage compatibility: *Do we require existing OpenLineage
>>>    clients to emit to Polaris by changing only the endpoint/config?
>>>       - If yes, then a server-side OpenLineage-compatible adapter
>>>       endpoint makes sense.
>>>       - If not, another option is a Polaris-provided OpenLineage
>>>       transport/client shim that reshapes OpenLineage events into a
>>>       Polaris-native lineage API.
>>>    - Those are different adoption tradeoffs, and I think we should
>>>       choose intentionally rather than letting OpenLineage compatibility
>>>       implicitly define the Polaris-native API.
>>>    2. *Polaris-native lineage model: *Should the long-term Polaris
>>>    lineage model/query API be OpenLineage-specific, or framework-agnostic 
>>> with
>>>    OpenLineage as one adapter?
>>>       - My preference is the latter. OpenLineage compatibility is
>>>       useful, but I would avoid making the OpenLineage payload shape the
>>>       Polaris-native lineage model by accident.
>>>    3. *Default battery behavior: *What should work out of the box?
>>>       - If query is part of the initial release, I think the battery
>>>       needs enough local state to answer a minimal query. A narrow default 
>>> could
>>>       be: latest observed direct table-level upstreams for a Polaris-managed
>>>       target table, with observed timestamp, producer/engine identifier, and
>>>       upstream dataset refs.
>>>    4. *Extension implementations: *What should be pluggable or future
>>>    work?
>>>       - I would put raw OpenLineage forwarding/proxying, external
>>>       backend query, full graph history, multi-hop traversal, column-level 
>>> query,
>>>       job/run graph, pruning/staleness, and richer governance-aware 
>>> behavior into
>>>       extension/future implementation areas rather than the default battery.
>>>
>>> *One subtle point*: I do not think the default battery and the REST/API
>>> envelope need to have exactly the same scope.
>>>
>>> The default battery can be intentionally small. For example, latest
>>> direct table-level lineage summary for Polaris-managed target tables. *But
>>> the REST/API envelope can still be designed so that richer implementations
>>> are possible later or through extensions*. For example, the API can
>>> carry metadata such as *granularity (table/col/job etc.), format/source
>>> protocol (OpenLineage or other lineage framework)*, or requested mode
>>> to help Polaris route handling to the configured provider, without
>>> requiring every default implementation to support every mode.
>>>
>>> Said differently, I would separate:
>>>
>>>    - what the API envelope can represent;
>>>    - what the default battery actually guarantees;
>>>    - what extension implementations can support.
>>>
>>> *My concrete recommendation would be*:
>>>
>>> If Polaris exposes a lineage Query API in the initial release, the
>>> default battery should provide a minimal latest table-level summary
>>> implementation so the query works out of the box. If we do not want any
>>> local persistence in the initial release, then I think the Query API should
>>> be out of scope for the initial release or clearly extension-provided. I
>>> would avoid exposing a core query API whose default implementation cannot
>>> answer anything.
>>>
>>> *My preferred shape would be*:
>>>
>>>    - Polaris-native lineage semantics stay *framework-agnostic*.
>>>    - OpenLineage is supported as an adapter/adoption path, *not as the
>>>    only Polaris lineage model*.
>>>    - The default battery, if query is in scope, is latest direct
>>>    table-level lineage summary only.
>>>    - *The API envelope leaves room for richer provider implementations*.
>>>    - Full OpenLineage backend behavior, downstream forwarding/proxying,
>>>    historical graph, column lineage, job/run lineage, multi-hop query,
>>>    pruning/staleness, and external backend query *are extension or
>>>    future work*.
>>>
>>> This would still give Polaris a useful out-of-the-box lineage
>>> experience, while avoiding turning Polaris into a full lineage backend in
>>> the first step.
>>>
>>> -ej
>>>
>>> On Mon, Jun 8, 2026 at 2:31 PM Adnan Hemani via dev <
>>> [email protected]> wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> > Is my understanding correct that option 1 is out of scope from your
>>>> perspective, and option 2 is not sufficient for the M0 you have in
>>>> mind? In
>>>> other words, you are proposing option 3 as the baseline, with active
>>>> planning toward option 4?
>>>>
>>>> Yes, that's correct. Happy to hear others' opinions, but Option 4 has
>>>> been
>>>> detailed in the proposal document since the very start. I'm happy to
>>>> wait a
>>>> few more days for others' opinions, but as of now I don't see any active
>>>> opposition to the plans as-is and the "lazy consensus" suggested
>>>> deadline
>>>> was over 2 weeks ago. I-Ting and I will start implementation in the
>>>> meantime.
>>>>
>>>> Best,
>>>> Adnan Hemani
>>>>
>>>> On Mon, Jun 8, 2026 at 3:19 AM Robert Stupp <[email protected]> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > Thanks Adnan, that helps clarify the shape.
>>>> >
>>>> > I think this is the point where broader community input would be
>>>> useful,
>>>> > because options 3/4 are a materially different commitment from
>>>> options 1/2.
>>>> >
>>>> > Is my understanding correct that option 1 is out of scope from your
>>>> > perspective, and option 2 is not sufficient for the M0 you have in
>>>> mind? In
>>>> > other words, you are proposing option 3 as the baseline, with active
>>>> > planning toward option 4?
>>>> >
>>>> > Option 3 does not just put a proxy endpoint in Polaris.
>>>> > It makes Polaris responsible for the OL ingest path: dataset-name
>>>> > resolution, per-entity authZ over OL assertions, policy for
>>>> non-Polaris
>>>> > datasets, trusted-service credentials to downstream systems,
>>>> request-size
>>>> > and payload limits, forwarding failure semantics, audit behavior, and
>>>> > tenant isolation.
>>>> >
>>>> > Option 4 then adds a Polaris-local lineage storage/query subsystem.
>>>> > Even if the first version stores only a reduced projection, Polaris
>>>> would
>>>> > take on many responsibilities of an OL backend: persistence semantics,
>>>> > query semantics, staleness/pruning, auth-filtered reads, backend
>>>> > compatibility, migrations, limits, and long-term compatibility with OL
>>>> > event shapes.
>>>> > At that point, even if intentionally limited, Polaris effectively
>>>> operates
>>>> > as an OL backend for the supported subset.
>>>> >
>>>> > So before we treat option 3 plus active planning toward option 4 as
>>>> the M0
>>>> > baseline, I think it would be good to hear whether others agree that
>>>> > Polaris should take on that implementation and maintenance surface
>>>> for the
>>>> > first milestone.
>>>> >
>>>> > Or whether we should start with a smaller integration point first.
>>>> >
>>>> > Robert
>>>> >
>>>>
>>>

Reply via email to