mohabouje opened a new issue, #37144:
URL: https://github.com/apache/arrow/issues/37144

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I have a pipeline of processes that do different stuff. One of the pipes 
reads a file and de-compresses it into a buffer. The buffer in question 
contains an arrow Table. A component takes this buffer and returns a table with 
the data.
   
   The code works fine, but there is a significant performance issue as the 
code needs to write the data into a file and then process it to read it with 
the Arrow file stream classes.
   
   I've been finding in the documentation a way to perform this action without 
writing the data into a file on disk. I was not able to make it work with a 
[RecordBatchStreamReader](https://arrow.apache.org/docs/cpp/api/ipc.html#_CPPv4N5arrow3ipc23RecordBatchStreamReaderE)
 or any of the alternatives I've found in the docs.
   
   Do you have any working examples showing how to avoid this disk write? Is 
this even possible?
   
   Here's the code in question:
   
   ```cpp
   auto read(std::span<const char> buffer) -> std::shared_ptr<arrow::Table> {
   
       // Wasting resources creating a temporal file...
       auto const output = "/tmp/random.arrow";
       auto stream       = std::ofstream(output, std::ios::binary | 
std::ios::out);
       stream.write(buffer.data(), buffer.size());
       stream.flush();
       stream.close();
   
       // Read the file and convert it into an arrow::Table
       auto const file_stream = ::arrow::io::ReadableFile::Open(output);
       auto const ipc_reader  = 
::arrow::ipc::RecordBatchFileReader::Open(file_stream.ValueUnsafe());
       if (!ipc_reader.ok()) {
           return nullptr;
       }
   
       auto const reader             = ipc_reader.ValueOrDie();
       auto const num_record_batches = reader->num_record_batches();
       auto batches                  = 
std::vector<std::shared_ptr<::arrow::RecordBatch>>(num_record_batches);
       for (auto i = 0; i < num_record_batches; ++i) {
           auto const batch = reader->ReadRecordBatch(i);
           if (!batch.ok()) {
               return nullptr;
           }
           batches[i] = batch.ValueOrDie();
       }
   
       auto const creation = ::arrow::Table::FromRecordBatches(batches);
       if (!creation.ok()) {
           return nullptr;
   
       }
       return creation.ValueOrDie();
   }
   ```
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to