Hi Adam,

I am definitely not completely up-to-date on this proposal so excuse me if
I'm missing something here. A few points I'd like to double click on:

* I agree with your point about Semantic Drift - and we should work
towards allowing the reuse of dataset information across semantic models.
I'd prefer we try Option 2 to build this directly into OSI first and if
that does not make sense, we can then consider dynamically generating
Semantic Models from within Polaris. Alternatively, if it's possible to
build the Dataset model into nested objects underneath a Table/View in
Polaris, that might also make sense to me.
* I'm not sure we should rely on Iceberg Properties to model the dataset.
Although re-using it is surely tempting, I don't think we should take a
dependency on this approach which was not built for this purpose.
Additionally, this may cause issues for our Generic Table support for OSI
model, which don't have those table properties. Conceptually, keeping the
Semantic information within Polaris rather than the data plane still seems
right to me.

Happy to see this proposal moving forward!

Best,
Adnan Hemani

On Fri, Jun 12, 2026 at 8:03 AM Adam Christian <
[email protected]> wrote:

> Hi community,
>
> I wanted to update you on the offline conversations between Yufei, JB,
> Dennis, and me.
>
> Overall, I am good to move forward with this proposal although I have some
> concerns. My specific concerns are:
>
> 1. Semantic Drift
>
> 2. Lack of Reusability
>
> #1 - Semantic Drift: This proposal adds a catalog entity that houses an OSI
> Semantic Model. The OSI Semantic Model contains Datasets which represent a
> table or a view with additional attributes [1]. In this proposal, there is
> currently no way to centralize a dataset’s semantic attributes. If a user
> wants to have two semantic models refer to a single dataset, they must
> duplicate the semantic attributes. In my opinion, this goes against the
> “inconsistent definitions, duplicated effort” that Yufei mentioned above.
>
> There are two alternatives that could handle this:
>
> 1. Store semantic attributes on the table or view, then dynamically
> generate the OSI Semantic Model from the referenced datasets
>
> 2. Work with the OSI Team to propose hierarchical Semantic Models
>
> The second alternative is backwards compatible with this proposal, but
> requires a change to the OSI Specification. The first can be done today but
> would be more costly to implement. The first option aligns better with the
> current converters in the OSI repository [2]. However, it could be made
> backward-compatible with the current proposal by adding an additional
> parameter to the GET for Semantic Models.
>
> #2 - Lack of Reusability: There are several attributes stored on Datasets
> which would be helpful for other consumers. For example, in OSI, Datasets
> and Fields have descriptions. These seem equivalent to a comment in an
> Iceberg Table Property or a doc field on the Schema’s NestedField. These
> comments are already widely supported by current Iceberg consumers and the
> current Polaris OSI Converter actually leverages this already [2]. Rather
> than reinventing a new attribute, we could use the ones there.
>
> Now, this is opinionated and a user might want an Iceberg Table Property to
> be different from their Semantic Model. The current proposal moves forward
> with storing a different attribute.
>
> Given that the concerns above can be handled in a backwards-compatible
> manner, I believe the value of this work is better than waiting for a
> perfect solution. The perfect is the enemy of the good in this case.
>
> Go community,
>
>
> Adam
>
> [1] -
>
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
>
> [2] -
>
> https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris
>
>
> On Fri, May 29, 2026 at 6:35 PM Yufei Gu <[email protected]> wrote:
>
> > Hi folks,
> >
> > As AI agents, BI tools, notebooks, and query engines increasingly consume
> > the same data, semantic definitions such as metrics and dimensions are
> > often duplicated across multiple systems. This leads to inconsistent
> > definitions, duplicated effort, and governance challenges. The rise of AI
> > agents further amplifies this problem, as agents rely on semantic context
> > to understand data and reason about business concepts. Without a shared
> > semantic layer, organizations often end up maintaining multiple versions
> of
> > the same business definitions across tools and applications.
> >
> > JB and I would like to start a discussion on adding semantic layer
> support
> > to Apache Polaris so semantic models can be defined once, governed
> > centrally, and consumed consistently across tools. The proposal[1]
> > introduces semantic models as a first class Polaris entity using the Open
> > Semantic Interchange (OSI)[2] specification[3]. At a high level, the
> > proposal adds:
> >
> >    - A new SEMANTIC_MODEL entity type
> >    - CRUD APIs for semantic models
> >    - Schema validation and authorization
> >
> > Polaris remains a metadata service and does not execute metrics or
> semantic
> > queries.
> > Feedback on the overall direction, design, and OSI adoption would be
> > greatly appreciated.
> >
> > 1.
> >
> >
> https://docs.google.com/document/d/1ZdI-1w_5LbyCMhvUhLCtOt-N1Z89L2P-oiGLaYayCZg/edit?usp=sharing
> > 2. https://open-semantic-interchange.org
> > 3.
> >
> >
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
> >
> >
> > Yufei
> >
>

Reply via email to