Hi Adam and Adnan,

Thanks for the thoughtful review and discussion.

I agree that semantic drift and reusability are important challenges. The
current proposal does not fully address reusing dataset level semantic
definitions across multiple semantic models. Longer term, I think it would
be worth exploring reusable dataset entities (separate entities instead of
part of a JSON blob) and hierarchical semantic models. I also agree that
this is likely something we should work through with the OSI community,
since the problem extends beyond Polaris itself.

On descriptions and metadata reuse, I think there is merit in leveraging
existing table and field level documentation where appropriate. At the same
time, I understand the concerns about coupling semantic definitions too
tightly to underlying table metadata, especially as Polaris expands beyond
Iceberg specific use cases. I'd like to gather more feedback from both the
Polaris and OSI communities here.

Overall, I agree these are important areas to improve, but I don't think
they should block the initial proposal. My hope is that we can establish a
solid foundation, gain some real world experience, and then evolve the
model together based on community feedback.

Thanks again for the feedback and for helping move this discussion forward.

I've opened an initial PR as well:
https://github.com/apache/polaris/pull/4816. Please take a look when you
have a chance.

Best,

Yufei


On Tue, Jun 16, 2026 at 8:07 AM Adam Christian <
[email protected]> wrote:

> Howdy Adnan,
>
> 1. Re: Semantic Drift: I agree with your concept of a Dataset model being
> some sort of nested objects underneath a Table/View. To me, that makes a
> lot more sense than just having it in the OSI spec outside of the table or
> view. That being said, I think we are aligned to move forward with this
> proposal and adjust as necessary.
> 2. Re: Descriptions & Iceberg Properties: I am unsure if the purposes are
> different for a Semantic Model Dataset Description and the Iceberg Table
> Property comment. Firstly, this is the approach the OSI community has taken
> with their converters. [1] Secondly, the Iceberg Table Property comment is
> defined as "a table-level description that documents the business meaning
> and usage context." [2] and the Semantic Model Dataset Description is
> defined as a "Human-readable description" [3]. These two seem to serve the
> same purpose. Now, you are right that Generic Tables do not support a
> comment property, however, I wonder if that is more about a missing
> component from Generic Table rather than an issue with using the comments
> as already defined. Table comments are pretty standard across the database
> world: "COMMENT ON TABLE employees IS 'Stores corporate employee
> profiles';"  is something you can do in Snowflake, PostgreSQL, Oracle,
> Databricks, etc, etc. That being said, I don't want to impede this proposal
> as we can always adjust when we get user feedback.
>
> [1] -
>
> https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris#export-osi--polaris-1
> [2] -
>
> https://iceberg.apache.org/docs/latest/configuration/#informational-properties
> [3] -
>
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md#schema-1
>
> Go community,
>
> Adam
>
> On Fri, Jun 12, 2026 at 8:29 PM Adnan Hemani via dev <
> [email protected]>
> wrote:
>
> > Hi Adam,
> >
> > I am definitely not completely up-to-date on this proposal so excuse me
> if
> > I'm missing something here. A few points I'd like to double click on:
> >
> > * I agree with your point about Semantic Drift - and we should work
> > towards allowing the reuse of dataset information across semantic models.
> > I'd prefer we try Option 2 to build this directly into OSI first and if
> > that does not make sense, we can then consider dynamically generating
> > Semantic Models from within Polaris. Alternatively, if it's possible to
> > build the Dataset model into nested objects underneath a Table/View in
> > Polaris, that might also make sense to me.
> > * I'm not sure we should rely on Iceberg Properties to model the dataset.
> > Although re-using it is surely tempting, I don't think we should take a
> > dependency on this approach which was not built for this purpose.
> > Additionally, this may cause issues for our Generic Table support for OSI
> > model, which don't have those table properties. Conceptually, keeping the
> > Semantic information within Polaris rather than the data plane still
> seems
> > right to me.
> >
> > Happy to see this proposal moving forward!
> >
> > Best,
> > Adnan Hemani
> >
> > On Fri, Jun 12, 2026 at 8:03 AM Adam Christian <
> > [email protected]> wrote:
> >
> > > Hi community,
> > >
> > > I wanted to update you on the offline conversations between Yufei, JB,
> > > Dennis, and me.
> > >
> > > Overall, I am good to move forward with this proposal although I have
> > some
> > > concerns. My specific concerns are:
> > >
> > > 1. Semantic Drift
> > >
> > > 2. Lack of Reusability
> > >
> > > #1 - Semantic Drift: This proposal adds a catalog entity that houses an
> > OSI
> > > Semantic Model. The OSI Semantic Model contains Datasets which
> represent
> > a
> > > table or a view with additional attributes [1]. In this proposal, there
> > is
> > > currently no way to centralize a dataset’s semantic attributes. If a
> user
> > > wants to have two semantic models refer to a single dataset, they must
> > > duplicate the semantic attributes. In my opinion, this goes against the
> > > “inconsistent definitions, duplicated effort” that Yufei mentioned
> above.
> > >
> > > There are two alternatives that could handle this:
> > >
> > > 1. Store semantic attributes on the table or view, then dynamically
> > > generate the OSI Semantic Model from the referenced datasets
> > >
> > > 2. Work with the OSI Team to propose hierarchical Semantic Models
> > >
> > > The second alternative is backwards compatible with this proposal, but
> > > requires a change to the OSI Specification. The first can be done today
> > but
> > > would be more costly to implement. The first option aligns better with
> > the
> > > current converters in the OSI repository [2]. However, it could be made
> > > backward-compatible with the current proposal by adding an additional
> > > parameter to the GET for Semantic Models.
> > >
> > > #2 - Lack of Reusability: There are several attributes stored on
> Datasets
> > > which would be helpful for other consumers. For example, in OSI,
> Datasets
> > > and Fields have descriptions. These seem equivalent to a comment in an
> > > Iceberg Table Property or a doc field on the Schema’s NestedField.
> These
> > > comments are already widely supported by current Iceberg consumers and
> > the
> > > current Polaris OSI Converter actually leverages this already [2].
> Rather
> > > than reinventing a new attribute, we could use the ones there.
> > >
> > > Now, this is opinionated and a user might want an Iceberg Table
> Property
> > to
> > > be different from their Semantic Model. The current proposal moves
> > forward
> > > with storing a different attribute.
> > >
> > > Given that the concerns above can be handled in a backwards-compatible
> > > manner, I believe the value of this work is better than waiting for a
> > > perfect solution. The perfect is the enemy of the good in this case.
> > >
> > > Go community,
> > >
> > >
> > > Adam
> > >
> > > [1] -
> > >
> > >
> >
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
> > >
> > > [2] -
> > >
> > >
> >
> https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris
> > >
> > >
> > > On Fri, May 29, 2026 at 6:35 PM Yufei Gu <[email protected]> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > As AI agents, BI tools, notebooks, and query engines increasingly
> > consume
> > > > the same data, semantic definitions such as metrics and dimensions
> are
> > > > often duplicated across multiple systems. This leads to inconsistent
> > > > definitions, duplicated effort, and governance challenges. The rise
> of
> > AI
> > > > agents further amplifies this problem, as agents rely on semantic
> > context
> > > > to understand data and reason about business concepts. Without a
> shared
> > > > semantic layer, organizations often end up maintaining multiple
> > versions
> > > of
> > > > the same business definitions across tools and applications.
> > > >
> > > > JB and I would like to start a discussion on adding semantic layer
> > > support
> > > > to Apache Polaris so semantic models can be defined once, governed
> > > > centrally, and consumed consistently across tools. The proposal[1]
> > > > introduces semantic models as a first class Polaris entity using the
> > Open
> > > > Semantic Interchange (OSI)[2] specification[3]. At a high level, the
> > > > proposal adds:
> > > >
> > > >    - A new SEMANTIC_MODEL entity type
> > > >    - CRUD APIs for semantic models
> > > >    - Schema validation and authorization
> > > >
> > > > Polaris remains a metadata service and does not execute metrics or
> > > semantic
> > > > queries.
> > > > Feedback on the overall direction, design, and OSI adoption would be
> > > > greatly appreciated.
> > > >
> > > > 1.
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ZdI-1w_5LbyCMhvUhLCtOt-N1Z89L2P-oiGLaYayCZg/edit?usp=sharing
> > > > 2. https://open-semantic-interchange.org
> > > > 3.
> > > >
> > > >
> > >
> >
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
> > > >
> > > >
> > > > Yufei
> > > >
> > >
> >
>

Reply via email to