[
https://issues.apache.org/jira/browse/ARROW-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Li updated ARROW-13253:
-----------------------------
Issue Type: Bug (was: Improvement)
> [C++][FlightRPC] Segfault when sending record batch >2GB
> --------------------------------------------------------
>
> Key: ARROW-13253
> URL: https://issues.apache.org/jira/browse/ARROW-13253
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, FlightRPC
> Affects Versions: 4.0.1
> Reporter: David Li
> Assignee: David Li
> Priority: Major
> Fix For: 5.0.0
>
>
> When sending a record batch > 2GiB, the server will segfault. Although Flight
> checks for this case and returns an error, it turns out that gRPC always
> tries to increment the refcount of the result buffer whether the
> serialization handler returned successfully or not:
> {code:cpp}
> // From gRPC 1.36
> Status CallOpSendMessage::SendMessagePtr(const M* message,
> WriteOptions options) {
> msg_ = message;
> write_options_ = options;
> // Store the serializer for later since we have access to the message
> serializer_ = [this](const void* message) {
> bool own_buf;
> // TODO(vjpai): Remove the void below when possible
> // The void in the template parameter below should not be needed
> // (since it should be implicit) but is needed due to an observed
> // difference in behavior between clang and gcc for certain internal users
> Status result = SerializationTraits<M, void>::Serialize(
> *static_cast<const M*>(message), send_buf_.bbuf_ptr(), &own_buf);
> if (!own_buf) {
> send_buf_.Duplicate();
> }
> return result;
> };
> return Status();
> }
> {code}
> Hence when Flight returns an error without initializing the buffer, we get a
> segfault.
> Originally reported on StackOverflow:
> https://stackoverflow.com/questions/68230146/pyarrow-flight-do-get-segfault-when-pandas-dataframe-over-3gb
--
This message was sent by Atlassian Jira
(v8.3.4#803005)