simonthum opened a new issue, #35371:
URL: https://github.com/apache/arrow/issues/35371
### Describe the usage question you have. Please include as many useful
details as possible.
I have a large-ish file with hundreads of record batches. I can read them
one by one successfully.
However, I would like to create a ML.net DataFrame, which I suppose means I
shoud join the record batches into a single large one before. I tried this:
````
var rbBuilder = new RecordBatch.Builder(allocator);
using (var stream = File.OpenRead("p2_uc.arrow"))
using (var reader = new ArrowFileReader(stream, allocator))
{
RecordBatch recordBatch;
while ((recordBatch = await reader.ReadNextRecordBatchAsync())
!= null)
{
rbBuilder.Append(recordBatch);
}
}
var df = DataFrame.FromArrowRecordBatch(
rbBuilder.Build());
````
However, I get an exception as the builder has far too many fields. I
suppose RecordBatch.Builder.Append is not intended for that job.
I have not found any example on how to read a arrow IPC file as a DataFrame.
Is that supported at all?
### Component(s)
C#
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]