yjshen edited a comment on pull request #1782:
URL:
https://github.com/apache/arrow-datafusion/pull/1782#issuecomment-1033592720
Thanks @alamb for the write-up of row use cases.
**[Use directly]** The current row implementation is mostly targeted at
payload use cases, that we do not update or check by index after the first
write, and only decompose to record batch at last. This is the case for sort
payload, hash aggregate composed grouping key (we can directly compare raw
bytes for equality), hash join key, and join payload.
**[Minor adapt]** We should change it a little bit by adhering to
word-aligned initializing and updating for aggregation state (for CPU
friendly), much as you suggested:
```
let state = RowWriter::new()
.for_aggregate(aggregate_exprs)
.build();
```
**[Minor adapt]** For composite sort key with no varlena, we shall remove
the null-bits part, padding null attributes bytes as all 0xFF or all 0x00
(according to null first or null last sort option), and do raw bytes comparison.
**[NOT FIT]** For composite sort key, if var length attributes (varlena)
exist and not the last, direct comparison of raw bytes of the current row
format doesn't fit. We need to store varlena in place, padding all sorting keys
to the longest width, on which we could compare directly using raw bytes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]