sgrebnov opened a new pull request, #21296:
URL: https://github.com/apache/datafusion/pull/21296

   ## Which issue does this PR close?
   
   PR improves `BigQueryDialect` dialect to make generated SQL 
`BigQuery`-compatible (fix runtime errors).
   
   ## What changes are included in this PR?
   
   Eight `Dialect` trait overrides added to `BigQueryDialect`:
   
   https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types
   
   1. `date_field_extract_style` → `Extract` + 
`scalar_function_to_sql_overrides`
   
   BigQuery does not support `date_part()`. TPC-H Q7, Q8, Q9 fail with 
`Function not found: date_part`.
   
   | Before (error) | After |
   |---|---|
   | `date_part('YEAR', l_shipdate)` | `EXTRACT(YEAR FROM l_shipdate)` |
   
   2. `interval_style` → `SQLStandard`
   
   BigQuery does not support PostgreSQL-style interval abbreviations. TPC-H Q4, 
Q20 fail with `Syntax error: Unexpected ")"`.
   
   | Before (error) | After |
   |---|---|
   | `INTERVAL '3 MONS'` | `INTERVAL '3' MONTH` |
   
   3. `float64_ast_dtype` → `Float64`
   
   BigQuery does not support `DOUBLE`. Fails with `Type not found: DOUBLE`.
   
   | Before (error) | After |
   |---|---|
   | `CAST(a AS DOUBLE)` | `CAST(a AS FLOAT64)` |
   
   4. `supports_column_alias_in_table_alias` → `false`
   
   BigQuery does not support column aliases in table alias definitions. Fails 
with `Expected ")" but got "("`.
   
   | Before (error) | After |
   |---|---|
   | `SELECT c.key FROM (...) AS c(key)` | `SELECT c.key FROM (SELECT 
o_orderkey AS key FROM orders) AS c` |
   
   5. `utf8_cast_dtype` + `large_utf8_cast_dtype` → `String`
   
   BigQuery does not support `VARCHAR`/`TEXT`. Fails with `Type not found: 
VARCHAR`, `Type not found: Text`.
   
   | Before (error) | After |
   |---|---|
   | `CAST(a AS VARCHAR)` | `CAST(a AS STRING)` |
   | `CAST(a AS TEXT)` | `CAST(a AS STRING)` |
   
   ### 6. `int64_cast_dtype` → `Int64`
   
   BigQuery does not support `BIGINT`. Fails with `Type not found: BIGINT`.
   >A 64-bit integer.
   SQL type name: INT64
   SQL aliases: INT, SMALLINT, INTEGER, BIGINT, TINYINT, BYTEINT
   
   | Before (error) | After |
   |---|---|
   | `CAST(a AS BIGINT)` | `CAST(a AS INT64)` |
   
   ### 7. `timestamp_cast_dtype` → `Timestamp` (no timezone qualifier)
   
   
https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type
   
   BigQuery does not support `TIMESTAMP WITH TIME ZONE`. Fails with `Syntax 
error: Expected ')' or keyword FORMAT but got keyword WITH`. `TIMESTAMP` should 
be used (preserves time zone information)/
   
   | Before (error) | After |
   |---|---|
   | `CAST(a AS TIMESTAMP WITH TIME ZONE)` | `CAST(a AS TIMESTAMP)` |
   
   ## Are these changes tested?
   
   Yes. Added `test_bigquery_dialect_overrides` unit test covering all eight 
overrides, verified against BigQuery.
   
   ## Are there any user-facing changes?
   
   No API changes. `BigQueryDialect` now generates valid BigQuery SQL for the 
affected expressions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to