Hi Adnan, I am generally supportive of Polaris integrating with OpenLineage, especially around configuration, identity, and dataset resolution. If Polaris must be inline for auth/enrichment, the gateway / resolver / forwarder shape seems like the most natural fit to me.
That said, I wonder whether the simpler M0 is to keep Polaris out of the OL event data path unless inline authorization/enrichment is a hard requirement. OL clients can already emit directly to Marquez/DataHub/OpenMetadata/etc. Polaris could still add value by exposing the OL transport configuration, auth/credential references, realm/tenant context, TLS settings, and naming conventions. That would avoid proxying large OL payloads through Polaris and would keep gateway, local storage, custom query APIs, and downstream query normalization from being bundled into one milestone. For the Polaris-local storage and Polaris-specific query API parts, I think a few things still need clarification before PRs assume that scope. First, could we make the security model more explicit? What exactly does LINEAGE_INGEST authorize: all inputs, all outputs, only outputs, or parent namespaces for CTAS-style outputs? What about external datasets that are not Polaris securables? Are OL events trusted metadata from privileged engine principals, or should Polaris validate submitted read/write claims? For forwarding, Polaris should use configured downstream credentials by default, not forward inbound Polaris bearer tokens or arbitrary request headers, and explicitly define what realm/principal/resolved-entity context is propagated. For reads, opaque nodes with preserved edges may still leak that hidden datasets exist and are connected to visible ones; that should be an explicit choice. Second, could we define runtime limits and failure behavior? OL payloads are effectively unbounded by the spec and can get large through schemas, SQL, column lineage, data quality facets, debug facets, or custom facets. Even if Polaris stores only a projection, it still has to receive, parse, and maybe forward the full event. The proposal should define request/event/batch limits, facet and column-lineage limits, timeouts, backpressure/concurrency behavior, oversized-payload responses, and logging rules. Third, could we clarify the user value of the reduced local store? The proposed local representation drops job/run history, run state, most facets, and process attribution. That may still be useful as a small dependency index for impact analysis or future catalog policy checks, but it is not what users usually get from querying an OL backend. Which queries is Polaris-local storage supposed to answer, and which are left to Marquez/DataHub/OpenMetadata/etc.? Given those questions, my preferred scoping would be: if inline Polaris auth/enrichment is not required, start with configuration/discovery and let OL clients emit directly to the OL backend. If Polaris does need to be in the event path, scope M0 to proxy/gateway mode only. I would keep local storage and Polaris-specific lineage query APIs out of M0 until the product semantics, security model, runtime limits, and persistence model are clearer. If local storage stays in scope, I think the proposal should define a backend-agnostic LineagePersistence contract first, instead of making relational tables the logical model. JDBC-only as the first implementation is fine, but the SPI should not bake in assumptions that make the NoSQL backend awkward later. There are also a few core semantics that I think still need to be spelled out: collapsed edges, staleness/removal, dataset identity and Polaris entity resolution, query node identity, supported OL event types, batch behavior, and forwarding failure modes. In particular, a 200 from Polaris should have a clear meaning in each mode: local-only, fail-closed forwarding, fail-open forwarding, and passthrough-only. So I am not against the OpenLineage integration direction. I mainly want to avoid treating "Polaris as an OL gateway" and "Polaris as a new lineage storage/query system" as the same milestone. The gateway/configuration pieces feel like a natural fit; the storage/query pieces still need more design work before I would be comfortable calling that part consensus. Robert On Mon, May 25, 2026 at 6:57 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi Adnan, > > Thanks for the update and apologies again for the late review. > > I posted some comments in the docs; all of them are non-blocking. We can > address them in the doc or PRs, if you prefer. > > However, for the sake of clarity, I'd like to better understand the idea > behind collapsing OL edges. > > What is the use case for the resulting data? > > What kind of problems can it help to solve? > > How is "staleness pruning" supposed to work (mentioned in section "Edge > Semantics")? > > Thanks, > Dmitri. > > On Fri, May 22, 2026 at 2:50 PM Adnan Hemani via dev < > [email protected]> > wrote: > > > Hi Dmitri, > > > > My understanding is that we discuss threads in the community before > > implementation to ensure alignment on the proposal's direction before > > community members put time into the implementation. I'm fully aligned > that > > the implementation details are always up for discussion (both before or > > after a proposal or even before/after a PR :) I just don't want us to > > proceed with putting further time/effort in if the community is not > aligned > > on introducing these endpoints in general. I am claiming "approv[al] by > > lazy consensus" for this direction because the doc has been circulating > in > > the ML for quite long without any objections. > > > > The old proposal has a different direction altogether and does not align > > with where we now want to go with this proposal. I wanted to ensure there > > was no confusion between the old (now-abandoned) proposal and the new > one. > > I-Ting is joining on as a co-author on the new proposal. > > > > We should definitely discuss this at the next Community Sync! > > > > Best, > > Adnan Hemani > > > > On Fri, May 22, 2026 at 8:53 AM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi Adnan, > > > > > > Apologies for the lack of feedback (too many concurrent activities). > > > > > > However, this is a discussion thread, not a vote on a concrete > > > implementation with a timeline. Lack of comment does not mean that > people > > > approve the design. > > > > > > If you prefer to proceed to concrete PRs, that would be fine from my > POV, > > > but please do not assume that PRs will not be challenged on aspects > that > > > were not already discussed. > > > > > > I believe it would be preferable to add this to the next Community Sync > > > agenda to invigorate the discussion. > > > > > > Also, I'm not sure what the status is of this document / email thread > > with > > > respect to I-Ting's old proposal [1]. Why not continue the discussion > on > > > that thread? WDYT? > > > > > > [1] https://lists.apache.org/thread/qqpq5hl1xrq8mwnd7kn4vgt8x9mqtvmg > > > > > > Cheers, > > > Dmitri. > > > > > > On Fri, May 22, 2026 at 1:07 AM Adnan Hemani via dev < > > > [email protected]> wrote: > > > > > >> Hi folks, > > >> > > >> Since there hasn't been much traffic on this document or email > threads, > > I > > >> will consider the document approved by lazy consensus if there are no > > >> further blocking comments by Friday, May 28th. > > >> > > >> Thanks! > > >> > > >> Best, > > >> Adnan Hemani > > >> > > >> On Thu, May 14, 2026 at 1:49 PM Yufei Gu <[email protected]> > wrote: > > >> > > >> > Hi Adnan, > > >> > > > >> > Thanks for putting this proposal together and resurfacing it in a > > >> > dedicated thread. > > >> > > > >> > I did one round of review on the document already, and I think it > > would > > >> be > > >> > great if more folks from the community could take a look and provide > > >> > feedback as well. It would be especially helpful to get perspectives > > >> from > > >> > people working on related areas before implementation starts. > > >> > > > >> > Thanks, > > >> > Yufei > > >> > > > >> > > > >> > On Wed, May 13, 2026 at 10:01 PM Adnan Hemani via dev < > > >> > [email protected]> wrote: > > >> > > > >> >> Hi all, > > >> >> > > >> >> I wanted to ensure that the OpenLineage proposal I previously > posted > > >> in a > > >> >> different thread [1] was actually being found, given that it was > deep > > >> into > > >> >> the thread. I request the community to review this proposal so we > can > > >> >> potentially start implementation. > > >> >> > > >> >> Proposal: > > >> >> > > >> >> > > >> > > > https://docs.google.com/document/d/1iOzIuFW66SFL2wZOADD9knMTG21OwY7VmaWVSvMUqQk/edit?tab=t.0#heading=h.59bmbnsf0gp1 > > >> >> > > >> >> Best, > > >> >> Adnan Hemani > > >> >> > > >> >> [1] > https://lists.apache.org/thread/1fd6hrvx0v0s5wm6gh74cdo3yn4w1zhx > > >> >> > > >> > > > >> > > > > > >
