[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #2261: Introduce RowLayout to represent rows for different purposes

GitBox Tue, 19 Apr 2022 09:44:53 -0700


yjshen commented on code in PR #2261:
URL: https://github.com/apache/arrow-datafusion/pull/2261#discussion_r853283598



##########
datafusion/core/src/row/layout.rs:
##########
@@ -50,13 +118,23 @@ fn type_width(dt: &DataType) -> usize {
     }
 }
 
+fn word_aligned_offsets(null_width: usize, schema: &Arc<Schema>) -> 
(Vec<usize>, usize) {
+    let mut offsets = vec![];
+    let mut offset = null_width;
+    for _ in schema.fields() {
+        offsets.push(offset);
+        offset += 8; // a 8-bytes word for each field

Review Comment:
   Let me make it a function here to choose between 8 and 16 based on 
DataTypes. 
   
   > Also it would be nice to know variable length data works here
   
   I haven't settled down my mind if I'd store grouping keys as well as group 
state as one row or two separate rows and have a JoinedRow { key: `vec<u8>`, 
state: `vec<u8>` } for the hashtable in aggregate.
   
   But for the group states only, I prefer to not handle variable length 
states, such as an array, median, or sketch in approxs', since it's not 
efficient to do in-place updates for these states, we'd better keep with 
Vec<ScalarValue> states for them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #2261: Introduce RowLayout to represent rows for different purposes

Reply via email to