lquerel opened a new issue #2043:
URL: https://github.com/apache/arrow-datafusion/issues/2043


   DataFusion already gives us a way to access nested fields with this syntax 
field["nested_field"] (field being an arrow struct field). However I didn't 
find any example or documentation to do the reverse operation, i.e. creating a 
struct from multiple fields. A pseudo SQL query (inspired from BIGQUERY) will 
be something like: `SELECT STRUCT(a AS field_1, b AS field_2) AS struct_name 
FROM table`. Is there a way to express this type of construct with DataFusion 
SQL?
   
   Support for this type of functionality will be particularly helpful for 
dealing with data sources that support nested fields (e.g. parquet and JSON) 
and when the query output MUST also be stored in a nested form. Example of use 
cases:
   - Remove privacy sensitive fields from a struct field.
   - Apply some transformations on some nested fields.
   - Compute new nested fields based on the existing nested fields (e.g. add 
country, city nested fields from an ip_address nested field).  
   
   Proposed solution:
   - Add a new operator to create a field of type Struct. Simple syntax: 
`STRUCT (expr AS field_name, ...) AS struct_field_name`
   - Support an EXCEPT keyword to remove one or several fields from a struct. 
e.g. `SELECT STRUCT (struct_a.* EXCEPT (field_1, field_2)) FROM xyz`. This type 
of syntax is robust to schema evolution as new fields will be automatically 
captured by the wildcard operator.
   - Support wildcard operators to combine multiple structs into a single one. 
e.g. `SELECT STRUCT(struct_a.*, struct_b.*) AS combined_struct FROM xyz`. This 
must fail if the structures a and b have fields with the same name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to