+1 to this deprecation. Thanks for putting together a clear summary.

FWIW it also has significantly worse performance than Calcite SQL dialect,
since it calls out to a ZetaSQL subprocess for most calculations, and that
is less optimized than Beam's Fn API.

Kenn

On Tue, Mar 25, 2025 at 4:18 PM Robert Bradshaw via user <
u...@beam.apache.org> wrote:

> I'm in favor of deprecating this and cleaning it up, but it depends on
> usage. I suspect it is low (or possibly non-existent, especially as there's
> little upside to moving away from the default). I cc'd user@ just in case
> anyone wants to chime in there. This may be a good thing to add to our
> release notes as well (perhaps we can get it in the one that's just about
> to go out).
>
> Unless there is strong, justified pushback, I'd get the deprecation status
> (e.g. on the javadocs, website) right away. For actual removal, I agree
> with the idea of waiting until it actually causes issues or we move to the
> next major beam release, though I might push back at 2.66 being a bit too
> quick even if the first condition is hit before then and might give people
> at least a quarter's notice.
>
> - Robert
>
>
> On Mon, Mar 24, 2025 at 2:27 PM Yi Hu via dev <dev@beam.apache.org> wrote:
>
>> Hi everyone,
>>
>> I would like to bring up discussion for deprecating Beam SQL's ZetaSQL
>> component [1].
>> Beam SQL currently serves with two SQL dialects (i) Apache Calcite and
>> (ii) ZetaSQL dialects, see documentation [2] due to the following reasons
>>
>> - Developments in Beam for ZetaSQL dialect effectively stalled since
>> early 2022 (See change history [3])
>>
>> - Despite incomplete support status, there is no new bug / feature
>> request opened ever since we migrated to use GitHub Issue, suggesting
>> minimal adoption [4]
>>
>> - We still need to keep zetasql up-to-date if its dependency conflicts
>> with other google dependencies, as a result ZetaSQL component introduces
>> maintenance burden when upgrading GCP-BOM (e.g. [5]).
>>
>> - One of the main reason that using ZetaSQL dialect, per [2], was because
>>
>> > ZetaSQL is more compatible with BigQuery, so it’s especially useful in
>> pipelines that write to or read from BigQuery tables.
>>
>>   As of today, as GCP BigQuery now supports using GoogleSQL (open-sourced
>> as ZetaSQL) querying data that's stored outside of BigQuery via BigQuery
>> Connections API / Federated query [6, 7]. This largely provides an
>> alternative for using Beam's ZetaSQL interacting with BigQuery.
>>
>> For these reasons, I propose initiating the process of deprecating
>> Beam SQL's ZetaSQL component. There are two decisions needed to be made:
>>
>> Firstly, agree on when to document the deprecated status for ZetaSQL
>> component in javadoc, beam website, currently I recommend do it in the
>> release that currently HEAD belongs, that is Beam 2.65.0 (cut April 30,
>> 2025)
>>
>> Secondly, stop publishing ZetaSQL artifacts. This is a breaking change,
>> and I think we can leave the deprecated status as is until the following
>> situation emerges, whichever comes first, and no earlier than Beam 2.66.0
>> (cut Jun 11, 2025)
>>
>> - Continued support for ZetaSQL component involving significant burdens,
>> like conflict with other Beam dependencies, supported Java versions, etc, or
>> - When Beam moved to the next release major release (3)
>>
>> Thanks for your attention, and any input welcomed!
>>
>> Regards,
>> Yi
>>
>> [1]
>> https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/zetasql
>> [2] https://beam.apache.org/documentation/dsls/sql/overview/
>> [3]
>> https://github.com/benEng/beam/commits/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java
>> [4]
>> https://github.com/apache/beam/issues?q=is%3Aissue%20%20label%3Azetasql%20
>> [5] https://github.com/apache/beam/pull/32902
>> [6] https://cloud.google.com/bigquery/docs/connections-api-intro
>> [7] https://cloud.google.com/bigquery/docs/federated-queries-intro
>>
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>>
>>

Reply via email to