[
https://issues.apache.org/jira/browse/ARROW-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112879#comment-17112879
]
Remi Dettai commented on ARROW-8875:
------------------------------------
Thanks for the heads up guys! It did not come up in my searches... I'm going to
blame the UX of the Jira search on this one ;)
This is exactly what I was thinking about.
nit [~apitrou] : I'm not sure to understand why you changed the signature of
GetObjectRange to return a GetObjectResult at
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L391|https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L383-L392]
. Simply returning the number of bytes would make the data flow more readable.
> [C++] use AWS SDK SetResponseStreamFactory to avoid a copy of bytes
> -------------------------------------------------------------------
>
> Key: ARROW-8875
> URL: https://issues.apache.org/jira/browse/ARROW-8875
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Remi Dettai
> Priority: Major
> Labels: C++, S3
> Fix For: 1.0.0
>
>
> Currently, in `GetObjectRange` of f3fs the `GetObjectRequest` has no
> `ResponseStreamFactory` assigned. This means that the bytes returned by the
> S3 API are first sent to a `std::basic_stringbuf`. To my understanding this
> has two performance impacts:
> * `std::basic_stringbuf` uses a growing array to buffer the response, so
> lots of allocations here
> * on top of that, you have a copy operation from the `std::basic_stringbuf`
> when data is read into the Arrow buffer.
> This seems to be a bit costly.
> With `ResponseStreamFactory`, we might manage to get the data directly into
> the Arrow buffer.
> I can take a try at it, but I would need some advice. Is there an existing
> utility to stream data into an Arrow buffer (if it exists, it is well
> hidden!) ? or should I stream the data into a plain array and then transfer
> ownership to Arrow ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)