serramatutu opened a new pull request, #3604: URL: https://github.com/apache/arrow-adbc/pull/3604
## Motivation The `Type` metadata key has two limitations which stems from BigQuery's API: 1. it says fields of type `ARRAY<T>` are just `T` with `Repeated=true` 2. it says `STRUCT<...>` fields are simply `RECORD`, and erases any information about the inner fields. These limitations can cause problems when trying to parse the `Type` key or when using it verbating against the warehouse in a statement, e.g a `CREATE TABLE` statement or a `AS T` cast. ## Summary This PR adds a new `BIGQUERY:type` key that formats the original SQL string as specified by BigQuery. Most types remain unchanged as they come from `gobigquery`, and in those cases this key will contain the same value as `Type`. However, arrays and structs get transformed to match the richer type string. ## Testing I ran a `CREATE TABLE AS` query against BigQuery. Here's the result for fields of different types [1] Regular non-nested types are simply copied over from the value of `Type` <details> <summary>1</summary> <img width="331" height="1071" alt="image" src="https://github.com/user-attachments/assets/ccd2ce17-37d8-4630-bef5-a503ed450c2a" /> </details> [2] An array of integers becomes `ARRAY<INTEGER>`, while `Type` remains `INTEGER` <details> <summary>2</summary> <img width="319" height="369" alt="image" src="https://github.com/user-attachments/assets/e588d7ac-c7ca-40fb-ab51-9795e566d240" /> </details> [3] An array of structs becomes `ARRAY<STRUCT<...>>` <details> <summary>3</summary> <img width="551" height="816" alt="image" src="https://github.com/user-attachments/assets/bb946ebc-747a-4529-88a8-68636f94e44e" /> </details> [4] A struct of arrays' inner types are `ARRAY<...>` <details> <summary>4</summary> <img width="610" height="922" alt="image" src="https://github.com/user-attachments/assets/932a3554-ea56-4b1f-8642-801ee91c4f63" /> </details> [5] A deeply nested struct also have the correct inner types <details> <summary>5</summary> <img width="1327" height="1307" alt="image" src="https://github.com/user-attachments/assets/3185651b-8809-42b0-adc4-ec956eaf9e87" /> </details> ## Related issues - https://github.com/apache/arrow-adbc/issues/3449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
