gringasalpastor opened a new issue, #34056:
URL: https://github.com/apache/arrow/issues/34056
### Describe the enhancement requested
*Describe the enhancement requested*
Arrow is column based, but often clients need to import external data
sources that are stored in a row based fashion. To help simplify the process, I
propose we create a `RowsToBatches` utility function that can take any valid
C++ range (`std::begin`/`std::end` is defined for `T`) and returns an
`arrow::RecordBatchReader` (convertible to an `arrow::Table`). This is
particularly useful when useful when the data types for each column are not
known at compile time - like in the case of an `std::variant`
The interface could look like the following (simplified for clarity)
```
Result<std::shared_ptr<RecordBatchReader>>> RowsToBatches(const
std::shared_ptr<Schema>& schema, std::reference_wrapper<Range> rows,
DataPointConvertor&& data_point_convertor);
```
See linked pull request for full details. The client would only need to
provide their `Schema` and a callable type that converts their structure’s
types into the associated arrow types.
If the client type is not a C++ range, they can either add iterators or
write a wrapper/adaptor that provides the iterators for the type.
*Example Usage:*
```
auto IntConvertor = [](ArrayBuilder& array_builder, int value) {
return static_cast<Int64Builder&>(array_builder).Append(value);
};
std::vector<std::vector<int>> data = {{1, 2, 4}, {5, 6, 7}};
auto batches = RowsToBatches(kTestSchema, std::ref(data), IntConvertor);
```
*Example Supported Types:*
- `std::vector<std::vector<std::variant<int, bsl::string>>>`
- `std::vector<MyRowStruct>`
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]