[GitHub] [arrow-datafusion] yjshen commented on pull request #1782: Introduce `Row` format backed by raw bytes

GitBox Wed, 09 Feb 2022 02:14:35 -0800


yjshen commented on pull request #1782:
URL: 
https://github.com/apache/arrow-datafusion/pull/1782#issuecomment-1033592720



   Thanks @alamb for the write-up of row use cases. 
   
   The current row implementation is mostly targeted at payload use cases, that 
we do not update or check by index after the first write, and only decompose to 
record batch at last. This is the case for sort payload, hash aggregate 
composed grouping key (we can directly compare raw bytes for equality), hash 
join key, and join payload.
   
   And we should change it a little bit by adhering to word-aligned 
initializing and updating for aggregation state (for CPU friendly), much as you 
suggested:
   ``` 
   let state = RowWriter::new()
     .for_aggregate(aggregate_exprs)
     .build();
   ```
   
   For composite sort key with no varlena, we shall remove the null-bits part, 
padding null attributes as all zeros or all ones (according to null first or 
null last sort option), and do raw bytes comparison.
   
   For composite sort key, if var length attributes (varlena) exist and not the 
last, direct comparison of raw bytes of the current row format doesn't fit. We 
need to store varlena in place, padding all sorting keys to the longest width, 
on which we could compare directly using raw bytes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on pull request #1782: Introduce `Row` format backed by raw bytes

Reply via email to