davisp opened a new pull request, #10590:
URL: https://github.com/apache/datafusion/pull/10590

   ## Which issue does this PR close?
   
   Closes #10589
   
   ## Rationale for this change
   
   Provide per-column key/value options in the `CREATE EXTERN TABLE` statement.
   
   ## What changes are included in this PR?
   
   This sets the BigQuery column `OPTIONS` key/values into the Arrow Schema's 
fields' metadata.
   
   ## Are these changes tested?
   
   Barely. I assume I'll need to add a couple more, but I figured I'd wait 
until I see what sort of enthusiasm this proposal receives.
   
   ## Are there any user-facing changes?
   
   Users can now access any parsed BigTable options to each Field's metadata. 
This allows for TableProvider/TableProviderFactory implementations the ability 
to accept per-field options. For a query like such:
   
   ```sql
   CREATE EXTERNAL TABLE test(
     col1 bigint NULL,
     col2 bigint NOT NULL OPTIONS(compression='zstd(5)')
   )
   STORED AS parquet
   LOCATION 'foo.parquet';
   ```
   
   This results in the field 'col2' on the Arrow Schema having a metdata value 
of `{"sql_option.compression", "'zstd(5)'"}`.
   
   I think technically this might be able to break downstream users in this 
scenario:
   
   1. For some reason they are using BigQuery OPTIONS clauses in their `CREATE 
EXTERN TABLE` column definitions.
   2. In their TableProviderFactory, they are attempting to access and use 
Field metadata in such a way that the now present `sql_option.*` keys breaks 
things.
   
   The first requirement here seems fairly unlikely to me? I'm making the rough 
assumption that a "standard" BigQuery table creation statement would fail to 
parse as a Datafusion `CREATE EXTERN TABLE` statement and thus it'd limit to 
people who have a) translated a BigQuery statment to a Datafusion statement 
while keeping their column `OPTIONS` even though they don't actually get passed 
to the TableProviderFactory.
   
   For the second half, I could see this actually being rather common (that a 
`MyThing::from(arrow_schema)` blows up with unexpected metadata). Though the 
intersection of these two sets seems like it'd be tiny?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to