roman-karlstetter-ifta commented on issue #37358: URL: https://github.com/apache/arrow/issues/37358#issuecomment-4376474999
One use case in which we use struct is the following: We use arrow IPC to store time series data. We also store aggregations of these time series, and for that, we derive the schema and store the actual aggregations as nested fields. Example: Schema: `| TS | col1 | col2 | ... |` Derived Schema for storing aggregated data: ``` | TS | COUNT | col1 | col2 | ... | | | | max | min | mean | max | min | mean | ... | ``` In this case, `col1` and `col2` are of type `struct` with the nested fields `max`, `min` and `mean` For certain queries, we're only interested in, say, max aggregations, so we only want to load `TS`, `col1.max`, `col2.max`, and the resulting schema of the returned record batches would be the following ``` | TS | col1 | col2 | | | max | max | ``` Currently, only "full" struct columns can be specified via `IpcReadOptions::included_fields`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
