Hi Calcite community,

I recently asked a related question on [CALCITE-4496][1]. I would like to
bring the discussion to the dev list because the question is not only about
one JIRA item or implementation detail, but about the abstraction boundary
of Measure in SQL itself.

Over the past quarter, I have been exploring semantic layer / semantic
modeling DSL design. During this process, I looked at adjacent systems and
ideas such as [Looker][3], [Cube][4], [Malloy][5], [dbt Semantic Layer /
MetricFlow][6], and Calcite’s measure work. These systems all point to the
same need: reusable business calculations and governed analytical models.
But they draw the boundary differently. Some keep measures in an external
semantic model; Calcite’s SQL measure work moves reusable calculations into
SQL itself.

That boundary is the part I would like to discuss.

I read [CALCITE-4496][1] and Julian Hyde / John Fremlin’s paper [“Measures
in SQL”][2]. The core motivation makes sense to me. Traditional SQL views
cannot preserve reusable aggregate calculations very well. Once a view
turns `AVG`, ratios, margins, or similar calculations into ordinary
columns, downstream roll-ups can lose the original calculation semantics.
The classic example is “average of averages”.

`MEASURE` tries to solve this by making a measure column not just a
materialized value, but a calculation that can be re-evaluated in the outer
query context. This is a powerful idea.

My question is about where this abstraction should live.

In traditional SQL, a view can be understood as a relation. It produces
rows and columns; columns are values. Even if the view definition is
complex, downstream queries can treat it as a black-box relation.

A measure-bearing view is different. Some columns are regular row values,
while some columns are deferred aggregate calculations that can be
re-evaluated in an outer query context. The view is no longer only a
relation in the traditional value-level sense; it also carries reusable
calculation semantics.

That leads to a few questions.

1. Is a measure-bearing view still best understood as a relation, or as a
relation plus semantic calculation metadata?

In semantic modeling systems, dimensions and measures are different kinds
of objects. A dimension describes rows and can be selected, grouped,
filtered, or displayed. A measure describes how to compute an aggregate
over a set of rows, and its value depends on query context.

If SQL schema contains both regular columns and measure columns, users and
planners need to understand that they are not the same kind of column. Is
this the intended mental model for SQL tables with measures?

2. Should `MEASURE` live in SQL schema/plans, or remain external semantic
metadata expanded before planning?

For semantic layers built on Calcite, there seem to be two possible
directions:

```text
semantic model -> SQL with MEASURE -> Calcite expands / optimizes measures
```

or:

```text
semantic model -> ordinary SQL -> Calcite optimizes relational SQL
```

The first direction gives SQL a native reusable-calculation abstraction.
The second keeps SQL closer to the traditional relational model, and leaves
semantic complexity in the modeling layer.

I am trying to understand which direction better matches Calcite’s
long-term goal.

3. If `MEASURE` lives in SQL, how broad should its evaluation context
become?

The paper defines measures using context-sensitive expressions, evaluation
context, and the `AT` operator. That model is expressive.

In real semantic layer systems, query context often includes more than
`GROUP BY`. It may include:

- filters before aggregation vs filters after aggregation;
- joins that may change grain;
- time dimensions, time grain, and timezone rules;
- hidden dimensions or filter-only fields;
- policy filters and query parameters;
- derived views that preserve or reshape dimensionality.

Should SQL `MEASURE` stay focused on reusable aggregate calculations and
their evaluation context, while these broader semantic-layer concepts
remain external metadata? Or is the long-term direction for SQL measures to
cover more of this modeling context?

For context only, here is a design note from my semantic modeling
exploration. It is not a Calcite syntax proposal; it is only background for
why I am thinking about this boundary:

[Designing a Semantic Modeling DSL][7]

I would appreciate feedback on this tradeoff: is `MEASURE` the right
abstraction boundary for SQL, or should measures remain primarily a
semantic-layer concept that compiles down to ordinary SQL?

References:

[1]: https://issues.apache.org/jira/browse/CALCITE-4496
[2]: https://arxiv.org/pdf/2406.00251
[3]: https://cloud.google.com/looker/docs/reference/param-explore
[4]: https://cube.dev/docs/product/data-modeling/concepts
[5]: https://docs.malloydata.dev/documentation
[6]: https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl
[7]:
https://github.com/caicancai/semantic-modeling-dsl/blob/main/blog/designing-a-semantic-modeling-dsl-en.md

Reply via email to