[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1782: Introduce `Row` format backed by raw bytes

GitBox Wed, 09 Feb 2022 02:17:47 -0800


yjshen edited a comment on pull request #1782:
URL: 
https://github.com/apache/arrow-datafusion/pull/1782#issuecomment-1033592720



   Thanks @alamb for the write-up of row use cases. 
   
   **[Use directly]** The current row implementation is mostly targeted at 
payload use cases, that we do not update or check by index after the first 
write, and only decompose to record batch at last. This is the case for sort 
payload, hash aggregate composed grouping key (we can directly compare raw 
bytes for equality), hash join key, and join payload.
   
   **[Minor adapt]** We should change it a little bit by adhering to 
word-aligned initializing and updating for aggregation state (for CPU 
friendly), much as you suggested:
   ``` 
   let state = RowWriter::new()
     .for_aggregate(aggregate_exprs)
     .build();
   ```
   
   **[Minor adapt]** For composite sort key with no varlena, we shall remove 
the null-bits part, padding null attributes bytes as all 0xFF or all 0x00 
(according to null first or null last sort option), and do raw bytes comparison.
   
   **[NOT FIT]** For composite sort key, if var length attributes (varlena) 
exist and not the last, direct comparison of raw bytes of the current row 
format doesn't fit. We need to store varlena in place, padding all sorting keys 
to the longest width, on which we could compare directly using raw bytes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1782: Introduce `Row` format backed by raw bytes

Reply via email to