Hi everyone,

I would like to bring up discussion for deprecating Beam SQL's ZetaSQL
component [1].
Beam SQL currently serves with two SQL dialects (i) Apache Calcite and (ii)
ZetaSQL dialects, see documentation [2] due to the following reasons

- Developments in Beam for ZetaSQL dialect effectively stalled since early
2022 (See change history [3])

- Despite incomplete support status, there is no new bug / feature request
opened ever since we migrated to use GitHub Issue, suggesting minimal
adoption [4]

- We still need to keep zetasql up-to-date if its dependency conflicts with
other google dependencies, as a result ZetaSQL component introduces
maintenance burden when upgrading GCP-BOM (e.g. [5]).

- One of the main reason that using ZetaSQL dialect, per [2], was because

> ZetaSQL is more compatible with BigQuery, so it’s especially useful in
pipelines that write to or read from BigQuery tables.

  As of today, as GCP BigQuery now supports using GoogleSQL (open-sourced
as ZetaSQL) querying data that's stored outside of BigQuery via BigQuery
Connections API / Federated query [6, 7]. This largely provides an
alternative for using Beam's ZetaSQL interacting with BigQuery.

For these reasons, I propose initiating the process of deprecating
Beam SQL's ZetaSQL component. There are two decisions needed to be made:

Firstly, agree on when to document the deprecated status for ZetaSQL
component in javadoc, beam website, currently I recommend do it in the
release that currently HEAD belongs, that is Beam 2.65.0 (cut April 30,
2025)

Secondly, stop publishing ZetaSQL artifacts. This is a breaking change, and
I think we can leave the deprecated status as is until the following
situation emerges, whichever comes first, and no earlier than Beam 2.66.0
(cut Jun 11, 2025)

- Continued support for ZetaSQL component involving significant burdens,
like conflict with other Beam dependencies, supported Java versions, etc, or
- When Beam moved to the next release major release (3)

Thanks for your attention, and any input welcomed!

Regards,
Yi

[1]
https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/zetasql
[2] https://beam.apache.org/documentation/dsls/sql/overview/
[3]
https://github.com/benEng/beam/commits/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java
[4]
https://github.com/apache/beam/issues?q=is%3Aissue%20%20label%3Azetasql%20
[5] https://github.com/apache/beam/pull/32902
[6] https://cloud.google.com/bigquery/docs/connections-api-intro
[7] https://cloud.google.com/bigquery/docs/federated-queries-intro

-- 

Yi Hu, (he/him/his)

Software Engineer

Reply via email to