hzuo commented on code in PR #5554:
URL: https://github.com/apache/arrow-rs/pull/5554#discussion_r1550275558
##########
arrow-ipc/src/reader.rs:
##########
@@ -428,14 +485,24 @@ impl<'a> ArrayReader<'a> {
}
}
-/// Creates a record batch from binary data using the `crate::RecordBatch`
indexes and the `Schema`
+/// Creates a record batch from binary data using the `crate::RecordBatch`
indexes and the `Schema`.
+///
+/// If `require_alignment` is true, this function will return an error if any
array data in the
+/// input `buf` is not properly aligned.
+/// Under the hood it will use [`arrow_data::ArrayDataBuilder::build`] to
construct [`arrow_data::ArrayData`].
+///
+/// If `require_alignment` is false, this function will automatically allocate
a new aligned buffer
+/// and copy over the data if any array data in the input `buf` is not
properly aligned.
+/// (Properly aligned array data will remain zero-copy.)
+/// Under the hood it will use [`arrow_data::ArrayDataBuilder::build_aligned`]
to construct [`arrow_data::ArrayData`].
pub fn read_record_batch(
buf: &Buffer,
batch: crate::RecordBatch,
schema: SchemaRef,
dictionaries_by_id: &HashMap<i64, ArrayRef>,
projection: Option<&[usize]>,
metadata: &MetadataVersion,
+ require_alignment: bool,
Review Comment:
That makes sense. I think you guys will have a vision around what goes into
the struct vs what doesn't, and where throughout the codebase that struct
should be passed (probably it isn't specific to read_record_batch).
To keep this PR bounded, what I'll propose for now is just make a private
version, e.g. `read_record_batch2` with this extra `require_alignment: bool`
arg, and keep the signature of the `pub fn read_record_batch` exactly as-is -
the public one calls the private one with `false` to keep the behavior the same.
Once merged, you guys can spin up the work to figure out struct to collect
the various parameters, and the `require_alignment` functionality can
eventually be exposed through that options struct.
Does keeping this PR bounded in this way sound good?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]