sgrebnov opened a new pull request, #21296: URL: https://github.com/apache/datafusion/pull/21296
## Which issue does this PR close? PR improves `BigQueryDialect` dialect to make generated SQL `BigQuery`-compatible (fix runtime errors). ## What changes are included in this PR? Eight `Dialect` trait overrides added to `BigQueryDialect`: https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types 1. `date_field_extract_style` → `Extract` + `scalar_function_to_sql_overrides` BigQuery does not support `date_part()`. TPC-H Q7, Q8, Q9 fail with `Function not found: date_part`. | Before (error) | After | |---|---| | `date_part('YEAR', l_shipdate)` | `EXTRACT(YEAR FROM l_shipdate)` | 2. `interval_style` → `SQLStandard` BigQuery does not support PostgreSQL-style interval abbreviations. TPC-H Q4, Q20 fail with `Syntax error: Unexpected ")"`. | Before (error) | After | |---|---| | `INTERVAL '3 MONS'` | `INTERVAL '3' MONTH` | 3. `float64_ast_dtype` → `Float64` BigQuery does not support `DOUBLE`. Fails with `Type not found: DOUBLE`. | Before (error) | After | |---|---| | `CAST(a AS DOUBLE)` | `CAST(a AS FLOAT64)` | 4. `supports_column_alias_in_table_alias` → `false` BigQuery does not support column aliases in table alias definitions. Fails with `Expected ")" but got "("`. | Before (error) | After | |---|---| | `SELECT c.key FROM (...) AS c(key)` | `SELECT c.key FROM (SELECT o_orderkey AS key FROM orders) AS c` | 5. `utf8_cast_dtype` + `large_utf8_cast_dtype` → `String` BigQuery does not support `VARCHAR`/`TEXT`. Fails with `Type not found: VARCHAR`, `Type not found: Text`. | Before (error) | After | |---|---| | `CAST(a AS VARCHAR)` | `CAST(a AS STRING)` | | `CAST(a AS TEXT)` | `CAST(a AS STRING)` | ### 6. `int64_cast_dtype` → `Int64` BigQuery does not support `BIGINT`. Fails with `Type not found: BIGINT`. >A 64-bit integer. SQL type name: INT64 SQL aliases: INT, SMALLINT, INTEGER, BIGINT, TINYINT, BYTEINT | Before (error) | After | |---|---| | `CAST(a AS BIGINT)` | `CAST(a AS INT64)` | ### 7. `timestamp_cast_dtype` → `Timestamp` (no timezone qualifier) https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type BigQuery does not support `TIMESTAMP WITH TIME ZONE`. Fails with `Syntax error: Expected ')' or keyword FORMAT but got keyword WITH`. `TIMESTAMP` should be used (preserves time zone information)/ | Before (error) | After | |---|---| | `CAST(a AS TIMESTAMP WITH TIME ZONE)` | `CAST(a AS TIMESTAMP)` | ## Are these changes tested? Yes. Added `test_bigquery_dialect_overrides` unit test covering all eight overrides, verified against BigQuery. ## Are there any user-facing changes? No API changes. `BigQueryDialect` now generates valid BigQuery SQL for the affected expressions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
