[GitHub] [arrow] simonthum opened a new issue, #35371: How to read an Arrow IPC with multiple record batches in C#

via GitHub Sat, 29 Apr 2023 16:34:55 -0700


simonthum opened a new issue, #35371:
URL: https://github.com/apache/arrow/issues/35371


   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I have a large-ish file with hundreads of record batches. I can read them 
one by one successfully.
   
   However, I would like to create a ML.net DataFrame, which I suppose means I 
shoud join the record batches into a single large one before. I tried this:
   
   ````
   var rbBuilder = new RecordBatch.Builder(allocator);
               
   using (var stream = File.OpenRead("p2_uc.arrow"))
   using (var reader = new ArrowFileReader(stream, allocator))
             {
                 RecordBatch recordBatch;
                 while ((recordBatch = await reader.ReadNextRecordBatchAsync()) 
!= null)
                 {
                     rbBuilder.Append(recordBatch);
                 }
             }
             
             var df = DataFrame.FromArrowRecordBatch(
                 rbBuilder.Build());
   ````
   
   However, I get an exception as the builder has far too many fields. I 
suppose RecordBatch.Builder.Append is not intended for that job.
   
   I have not found any example on how to read a arrow IPC file as a DataFrame. 
Is that supported at all?
   
   ### Component(s)
   
   C#


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] simonthum opened a new issue, #35371: How to read an Arrow IPC with multiple record batches in C#

Reply via email to