gringasalpastor opened a new issue, #34056:
URL: https://github.com/apache/arrow/issues/34056

   ### Describe the enhancement requested
   
   *Describe the enhancement requested*
   
   Arrow is column based, but often clients need to import external data 
sources that are stored in a row based fashion. To help simplify the process, I 
propose we create a `RowsToBatches` utility function that can take any valid 
C++ range (`std::begin`/`std::end` is defined for `T`) and returns an 
`arrow::RecordBatchReader` (convertible to an `arrow::Table`). This is 
particularly useful when useful when the data types for each column are not 
known at compile time - like in the case of an `std::variant`
   
   The interface could look like the following (simplified for clarity)
   
   ```
   Result<std::shared_ptr<RecordBatchReader>>> RowsToBatches(const 
std::shared_ptr<Schema>& schema, std::reference_wrapper<Range> rows, 
DataPointConvertor&& data_point_convertor);
   ```
   
   See linked pull request for full details. The client would only need to 
provide their `Schema` and a callable type that converts their structure’s 
types into the associated arrow types.
   
   If the client type is not a C++ range, they can either add iterators or 
write a wrapper/adaptor that provides the iterators for the type.
   
   
   *Example Usage:*
   
   ```
   auto IntConvertor = [](ArrayBuilder& array_builder, int value) {
        return static_cast<Int64Builder&>(array_builder).Append(value);
   };
   std::vector<std::vector<int>> data = {{1, 2, 4}, {5, 6, 7}};
   auto batches = RowsToBatches(kTestSchema, std::ref(data), IntConvertor);
   ```
   *Example Supported Types:*
    - `std::vector<std::vector<std::variant<int, bsl::string>>>`
    - `std::vector<MyRowStruct>`
   
   
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to