Marko Mikulicic created ARROW-12265:
---------------------------------------
Summary: flight_data_from_arrow_batch sends too much data
Key: ARROW-12265
URL: https://issues.apache.org/jira/browse/ARROW-12265
Project: Apache Arrow
Issue Type: Bug
Components: FlightRPC, Rust
Affects Versions: 4.0.0
Reporter: Marko Mikulicic
Arrow arrays can share the same backing store, even if the array is just a
"view" of a slice of another array.
Yet, when `flight_data_from_arrow_batch` encodes the arrays into a FlightData,
it blindly copies the entire buffer ready to be sent over the wire.
Thus, for example, when DataFusion uses the `arrow::compute::limit` operator to
return a few elements of an array, we still end up with a the full
(potentially) large array being sent over the wire.
Since encoding the array in a FlightData involves copying the data anyway,
perhaps it would be beneficial to take the Array length in consideration and
copy only the parts of the buffer that contain actual data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)