Chengxin Ma created ARROW-7522: ---------------------------------- Summary: Broken Record Batch returned from a function call Key: ARROW-7522 URL: https://issues.apache.org/jira/browse/ARROW-7522 Project: Apache Arrow Issue Type: Bug Components: C++, C++ - Plasma Affects Versions: 0.15.1 Environment: macOS Reporter: Chengxin Ma
Scenario: retrieving Record Batch from Plasma with known Object ID. The following code snippet works well: {code:java} int main(int argc, char **argv) { plasma::ObjectID object_id = plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF"); // Start up and connect a Plasma client. plasma::PlasmaClient client; ARROW_CHECK_OK(client.Connect("/tmp/store")); plasma::ObjectBuffer object_buffer; ARROW_CHECK_OK(client.Get(&object_id, 1, -1, &object_buffer)); // Retrieve object data. auto buffer = object_buffer.data; arrow::io::BufferReader buffer_reader(buffer); std::shared_ptr<arrow::ipc::RecordBatchReader> record_batch_stream_reader; ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(&buffer_reader, &record_batch_stream_reader)); std::shared_ptr<arrow::RecordBatch> record_batch; arrow::Status status = record_batch_stream_reader->ReadNext(&record_batch); std::cout << "record_batch->column_name(0): " << record_batch->column_name(0) << std::endl; std::cout << "record_batch->num_columns(): " << record_batch->num_columns() << std::endl; std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << std::endl; std::cout << "record_batch->column(0)->length(): " << record_batch->column(0)->length() << std::endl; std::cout << "record_batch->column(0)->ToString(): " << record_batch->column(0)->ToString() << std::endl; } {code} {{record_batch->column(0)->ToString()}} would incur a segmentation fault if retrieving Record Batch is wrapped in a function: {code:java} std::shared_ptr<arrow::RecordBatch> GetRecordBatchFromPlasma(plasma::ObjectID object_id) { // Start up and connect a Plasma client. plasma::PlasmaClient client; ARROW_CHECK_OK(client.Connect("/tmp/store")); plasma::ObjectBuffer object_buffer; ARROW_CHECK_OK(client.Get(&object_id, 1, -1, &object_buffer)); // Retrieve object data. auto buffer = object_buffer.data; arrow::io::BufferReader buffer_reader(buffer); std::shared_ptr<arrow::ipc::RecordBatchReader> record_batch_stream_reader; ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(&buffer_reader, &record_batch_stream_reader)); std::shared_ptr<arrow::RecordBatch> record_batch; arrow::Status status = record_batch_stream_reader->ReadNext(&record_batch); // Disconnect the client. ARROW_CHECK_OK(client.Disconnect()); return record_batch; } int main(int argc, char **argv) { plasma::ObjectID object_id = plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF"); std::shared_ptr<arrow::RecordBatch> record_batch = GetRecordBatchFromPlasma(object_id); std::cout << "record_batch->column_name(0): " << record_batch->column_name(0) << std::endl; std::cout << "record_batch->num_columns(): " << record_batch->num_columns() << std::endl; std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << std::endl; std::cout << "record_batch->column(0)->length(): " << record_batch->column(0)->length() << std::endl; std::cout << "record_batch->column(0)->ToString(): " << record_batch->column(0)->ToString() << std::endl; } {code} The meta info of the Record Batch such as number of columns and rows is still available, but I can't see the content of the columns. {{lldb}} says that the stop reason is {{EXC_BAD_ACCESS}}, so I think the Record Batch is destroyed after {{GetRecordBatchFromPlasma}} finishes. But why can I still see the meta info of this Record Batch? What is the proper way to get the Record Batch if we insist using a function? -- This message was sent by Atlassian Jira (v8.3.4#803005)