[GitHub] [arrow-rs] tustvold commented on issue #2677: Arrow Row Format

GitBox Sun, 11 Sep 2022 00:07:36 -0700


tustvold commented on issue #2677:
URL: https://github.com/apache/arrow-rs/issues/2677#issuecomment-1242902278


   This will not allow operations on data with multiple schema, same as with 
RecordBatch.
   
   That being said, in the case of rows with different variants, nulls will be 
inserted by the JSON reader for the columns found in other variants but not 
present in the current record. The data is effectively unified to a single 
schema.
   
   This trades off memory efficiency for the abilitybto efficiently process 
data in a columnar fashion, without per-value dynamic dispatch.
   
   This is likely acceptable, however, partitioning the data so that different 
schema aren't interleaved will lead to better performance and memory 
efficiency. The row format could help with implementing this, but does not 
alter the nature of arrow schema


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on issue #2677: Arrow Row Format

Reply via email to