I'm in favor of deprecating this and cleaning it up, but it depends on usage. I suspect it is low (or possibly non-existent, especially as there's little upside to moving away from the default). I cc'd user@ just in case anyone wants to chime in there. This may be a good thing to add to our release notes as well (perhaps we can get it in the one that's just about to go out).
Unless there is strong, justified pushback, I'd get the deprecation status (e.g. on the javadocs, website) right away. For actual removal, I agree with the idea of waiting until it actually causes issues or we move to the next major beam release, though I might push back at 2.66 being a bit too quick even if the first condition is hit before then and might give people at least a quarter's notice. - Robert On Mon, Mar 24, 2025 at 2:27 PM Yi Hu via dev <dev@beam.apache.org> wrote: > Hi everyone, > > I would like to bring up discussion for deprecating Beam SQL's ZetaSQL > component [1]. > Beam SQL currently serves with two SQL dialects (i) Apache Calcite and > (ii) ZetaSQL dialects, see documentation [2] due to the following reasons > > - Developments in Beam for ZetaSQL dialect effectively stalled since early > 2022 (See change history [3]) > > - Despite incomplete support status, there is no new bug / feature request > opened ever since we migrated to use GitHub Issue, suggesting minimal > adoption [4] > > - We still need to keep zetasql up-to-date if its dependency conflicts > with other google dependencies, as a result ZetaSQL component introduces > maintenance burden when upgrading GCP-BOM (e.g. [5]). > > - One of the main reason that using ZetaSQL dialect, per [2], was because > > > ZetaSQL is more compatible with BigQuery, so it’s especially useful in > pipelines that write to or read from BigQuery tables. > > As of today, as GCP BigQuery now supports using GoogleSQL (open-sourced > as ZetaSQL) querying data that's stored outside of BigQuery via BigQuery > Connections API / Federated query [6, 7]. This largely provides an > alternative for using Beam's ZetaSQL interacting with BigQuery. > > For these reasons, I propose initiating the process of deprecating > Beam SQL's ZetaSQL component. There are two decisions needed to be made: > > Firstly, agree on when to document the deprecated status for ZetaSQL > component in javadoc, beam website, currently I recommend do it in the > release that currently HEAD belongs, that is Beam 2.65.0 (cut April 30, > 2025) > > Secondly, stop publishing ZetaSQL artifacts. This is a breaking change, > and I think we can leave the deprecated status as is until the following > situation emerges, whichever comes first, and no earlier than Beam 2.66.0 > (cut Jun 11, 2025) > > - Continued support for ZetaSQL component involving significant burdens, > like conflict with other Beam dependencies, supported Java versions, etc, or > - When Beam moved to the next release major release (3) > > Thanks for your attention, and any input welcomed! > > Regards, > Yi > > [1] > https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/zetasql > [2] https://beam.apache.org/documentation/dsls/sql/overview/ > [3] > https://github.com/benEng/beam/commits/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java > [4] > https://github.com/apache/beam/issues?q=is%3Aissue%20%20label%3Azetasql%20 > [5] https://github.com/apache/beam/pull/32902 > [6] https://cloud.google.com/bigquery/docs/connections-api-intro > [7] https://cloud.google.com/bigquery/docs/federated-queries-intro > > -- > > Yi Hu, (he/him/his) > > Software Engineer > > >