Jefffrey commented on code in PR #9000:
URL: https://github.com/apache/arrow-rs/pull/9000#discussion_r2643573644
##########
arrow-row/src/lib.rs:
##########
@@ -1951,11 +1956,26 @@ unsafe fn decode_column(
let child_array =
unsafe { converter.convert_raw(&mut sparse_data,
validate_utf8) }?;
+
+ // track bytes consumed for rows that belong to this
field
+ for (row_idx, child_row) in field_rows.iter() {
+ let remaining_len = sparse_data[*row_idx].len();
+ bytes_consumed[*row_idx] = 1 + child_row.len() -
remaining_len;
+ }
Review Comment:
```suggestion
// ensure we advance pass consumed bytes in rows
for (row_idx, child_row) in field_rows.iter() {
let remaining_len = sparse_data[*row_idx].len();
let consumed_length = 1 + child_row.len() -
remaining_len;
rows[*row_idx] =
&rows[*row_idx][consumed_length..];
}
```
Thoughts of inlining it like this, which can remove the need for a separate
`bytes_consumed` vec?
##########
arrow-row/src/lib.rs:
##########
@@ -1930,6 +1929,12 @@ unsafe fn decode_column(
let child_array =
unsafe { converter.convert_raw(&mut child_data,
validate_utf8) }?;
+ // track bytes consumed by comparing original and
remaining lengths
+ for (i, (row_idx, child_row)) in
field_rows.iter().enumerate() {
+ let remaining_len = child_data[i].len();
+ bytes_consumed[*row_idx] = 1 + child_row.len() -
remaining_len;
+ }
Review Comment:
```suggestion
for ((row_idx, original_bytes), remaining_bytes) in
field_rows.iter().zip(child_data)
{
bytes_consumed[*row_idx] =
1 + original_bytes.len() -
remaining_bytes.len();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]