bkietz commented on code in PR #37360:
URL: https://github.com/apache/arrow/pull/37360#discussion_r1307628219
##########
cpp/src/arrow/compute/function_internal.cc:
##########
@@ -83,7 +83,8 @@ Result<std::unique_ptr<FunctionOptions>>
GenericOptionsType::Deserialize(
Result<std::unique_ptr<FunctionOptions>> DeserializeFunctionOptions(
const Buffer& buffer) {
- io::BufferReader stream(buffer);
+ // Create a non-owned Buffer to avoid copying
Review Comment:
This makes me slightly nervous since some function options have scalar or
buffer DataMembers which *could* be zero copied (leaving that one with a buffer
which dangles because it points into this non-owning one). I can't immediately
identify where they are actually copied, but I wrote this ad hoc test to check
that copying does happen somewhere
```c++
TEST(TestCumulative, AdHoc) {
CumulativeOptions options;
options.start = MakeScalar("hello");
ASSERT_OK_AND_ASSIGN(auto buf, options.options_type()->Serialize(options));
ASSERT_OK_AND_ASSIGN(auto rt_start,
options.options_type()->Deserialize(*buf).Map([](auto
rt) {
return *checked_cast<CumulativeOptions&>(*rt).start;
}));
ASSERT_EQ(rt_start->type->id(), Type::STRING);
auto prbuf = [](const Buffer& buf) {
std::cout << "@" << reinterpret_cast<uintptr_t>(buf.data()) << "[" <<
buf.size()
<< "]\n";
};
prbuf(*buf);
prbuf(*checked_cast<const StringScalar&>(*rt_start).value);
}
```
I think we should add (a more formal version of) this test to ensure future
refactoring will never admit production of dangling function options.
Alternatively we could explicitly ensure deep copying here or in
GenericFromScalar
##########
cpp/src/arrow/flight/sql/client.cc:
##########
@@ -519,13 +519,17 @@ arrow::Result<std::shared_ptr<PreparedStatement>>
PreparedStatement::ParseRespon
std::shared_ptr<Schema> dataset_schema;
if (!serialized_dataset_schema.empty()) {
- io::BufferReader dataset_schema_reader(serialized_dataset_schema);
+ // Create a non-owned Buffer to avoid copying
Review Comment:
My above lifetime comment does not apply here since there's no way the
deserialized `Schema` could reference memory in the buffer we're reading; zero
copy lgtm
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]