tustvold commented on issue #2677: URL: https://github.com/apache/arrow-rs/issues/2677#issuecomment-1242902278
This will not allow operations on data with multiple schema, same as with RecordBatch. That being said, in the case of rows with different variants, nulls will be inserted by the JSON reader for the columns found in other variants but not present in the current record. The data is effectively unified to a single schema. This trades off memory efficiency for the abilitybto efficiently process data in a columnar fashion, without per-value dynamic dispatch. This is likely acceptable, however, partitioning the data so that different schema aren't interleaved will lead to better performance and memory efficiency. The row format could help with implementing this, but does not alter the nature of arrow schema -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
