[ 
https://issues.apache.org/jira/browse/ARROW-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-13253:
-----------------------------
    Description: 
When sending a record batch > 2GiB, the server will segfault. Although Flight 
checks for this case and returns an error, it turns out that gRPC always tries 
to increment the refcount of the result buffer whether the serialization 
handler returned successfully or not:
{code:cpp}
// From gRPC 1.36
Status CallOpSendMessage::SendMessagePtr(const M* message,
                                         WriteOptions options) {
  msg_ = message;
  write_options_ = options;
  // Store the serializer for later since we have access to the message
  serializer_ = [this](const void* message) {
    bool own_buf;
    // TODO(vjpai): Remove the void below when possible
    // The void in the template parameter below should not be needed
    // (since it should be implicit) but is needed due to an observed
    // difference in behavior between clang and gcc for certain internal users
    Status result = SerializationTraits<M, void>::Serialize(
        *static_cast<const M*>(message), send_buf_.bbuf_ptr(), &own_buf);
    if (!own_buf) {
      // XXX(lidavidm): This should perhaps check result.ok(), or Serialize 
should
      // unconditionally initialize send_buf_
      send_buf_.Duplicate();
    }
    return result;
  };
  return Status();
}
{code}
Hence when Flight returns an error without initializing the buffer, we get a 
segfault.

Originally reported on StackOverflow: 
[https://stackoverflow.com/questions/68230146/pyarrow-flight-do-get-segfault-when-pandas-dataframe-over-3gb]

  was:
When sending a record batch > 2GiB, the server will segfault. Although Flight 
checks for this case and returns an error, it turns out that gRPC always tries 
to increment the refcount of the result buffer whether the serialization 
handler returned successfully or not:
{code:cpp}
// From gRPC 1.36
Status CallOpSendMessage::SendMessagePtr(const M* message,
                                         WriteOptions options) {
  msg_ = message;
  write_options_ = options;
  // Store the serializer for later since we have access to the message
  serializer_ = [this](const void* message) {
    bool own_buf;
    // TODO(vjpai): Remove the void below when possible
    // The void in the template parameter below should not be needed
    // (since it should be implicit) but is needed due to an observed
    // difference in behavior between clang and gcc for certain internal users
    Status result = SerializationTraits<M, void>::Serialize(
        *static_cast<const M*>(message), send_buf_.bbuf_ptr(), &own_buf);
    if (!own_buf) {
      send_buf_.Duplicate();
    }
    return result;
  };
  return Status();
}
{code}
Hence when Flight returns an error without initializing the buffer, we get a 
segfault.

Originally reported on StackOverflow: 
https://stackoverflow.com/questions/68230146/pyarrow-flight-do-get-segfault-when-pandas-dataframe-over-3gb


> [C++][FlightRPC] Segfault when sending record batch >2GB
> --------------------------------------------------------
>
>                 Key: ARROW-13253
>                 URL: https://issues.apache.org/jira/browse/ARROW-13253
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, FlightRPC
>    Affects Versions: 4.0.1
>            Reporter: David Li
>            Assignee: David Li
>            Priority: Major
>             Fix For: 5.0.0
>
>
> When sending a record batch > 2GiB, the server will segfault. Although Flight 
> checks for this case and returns an error, it turns out that gRPC always 
> tries to increment the refcount of the result buffer whether the 
> serialization handler returned successfully or not:
> {code:cpp}
> // From gRPC 1.36
> Status CallOpSendMessage::SendMessagePtr(const M* message,
>                                          WriteOptions options) {
>   msg_ = message;
>   write_options_ = options;
>   // Store the serializer for later since we have access to the message
>   serializer_ = [this](const void* message) {
>     bool own_buf;
>     // TODO(vjpai): Remove the void below when possible
>     // The void in the template parameter below should not be needed
>     // (since it should be implicit) but is needed due to an observed
>     // difference in behavior between clang and gcc for certain internal users
>     Status result = SerializationTraits<M, void>::Serialize(
>         *static_cast<const M*>(message), send_buf_.bbuf_ptr(), &own_buf);
>     if (!own_buf) {
>       // XXX(lidavidm): This should perhaps check result.ok(), or Serialize 
> should
>       // unconditionally initialize send_buf_
>       send_buf_.Duplicate();
>     }
>     return result;
>   };
>   return Status();
> }
> {code}
> Hence when Flight returns an error without initializing the buffer, we get a 
> segfault.
> Originally reported on StackOverflow: 
> [https://stackoverflow.com/questions/68230146/pyarrow-flight-do-get-segfault-when-pandas-dataframe-over-3gb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to