mapleFU commented on code in PR #43763:
URL: https://github.com/apache/arrow/pull/43763#discussion_r1739641829
##########
cpp/src/arrow/compute/row/row_encoder_internal.h:
##########
@@ -270,14 +348,26 @@ class ARROW_EXPORT RowEncoder {
}
int32_t num_rows() const {
- return offsets_.size() == 0 ? 0 : static_cast<int32_t>(offsets_.size() -
1);
+ return offsets_.empty() ? 0 : static_cast<int32_t>(offsets_.size() - 1);
}
private:
ExecContext* ctx_{nullptr};
std::vector<std::shared_ptr<KeyEncoder>> encoders_;
+ // offsets_ vector stores the starting position (offset) of each encoded row
+ // within the bytes_ vector. This allows for quick access to individual rows.
+ //
+ // The size would be num_rows + 1 if not empty, the last element is the total
+ // length of the bytes_ vector.
Review Comment:
Unrelated to this issue: I'm thinking of an optimization here. We can define
a flag to indicate that all the columns are fixed-sized or null. If it's, we
can not maintain the offsets, just static compute a `fixed-row-size`, and using
fixed-row-size to seek for the row.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]