Marko Mikulicic created ARROW-12265:
---------------------------------------

             Summary: flight_data_from_arrow_batch sends too much data
                 Key: ARROW-12265
                 URL: https://issues.apache.org/jira/browse/ARROW-12265
             Project: Apache Arrow
          Issue Type: Bug
          Components: FlightRPC, Rust
    Affects Versions: 4.0.0
            Reporter: Marko Mikulicic


Arrow arrays can share the same backing store, even if the array is just a 
"view" of a slice of another array.

Yet, when `flight_data_from_arrow_batch` encodes the arrays into a FlightData, 
it blindly copies the entire buffer ready to be sent over the wire.

Thus, for example, when DataFusion uses the `arrow::compute::limit` operator to 
return a few elements of an array, we still end up with a the full 
(potentially) large array being sent over the wire.

 

Since encoding the array in a FlightData involves copying the data anyway, 
perhaps it would be beneficial to take the Array length in consideration and 
copy only the parts of the buffer that contain actual data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to