vivek1729 commented on issue #41683:
URL: https://github.com/apache/arrow/issues/41683#issuecomment-2153623831

   Thanks a lot for taking a look at this issue @trxcllnt . Not sure if we can 
use the `readAll` abstraction as we don't expect arrow tables from our http 
response. Instead we expect many sequence of record batches each sharing a 
common schema. Specifically, our data could look something like this:
   
   ```
   <recordBatch1_1Schema1><recordBatch1_2Schema1>|<recordBatch2_1Schema2>|...
   ```
   
   Yes, we are using the async iterables as you suggested in our code. Here's 
what the high level code looks like for our case:
   
   ```typescript
   import { AsyncRecordBatchStreamReader, Table } from 'apache-arrow';
   
   const resultTables = [];
   const arrowReader = await AsyncRecordBatchStreamReader.from(responseStream);
   await arrowReader.open({ autoDestroy: false });
   while (true) {
       let batches = [];/*RecordBatch<any>[]*/
       let batch;/*IteratorResult<RecordBatch<any>, any> */
       while (batch = await arrowReader.next()) {
           // End of result set
           if (!batch || batch?.done)
           {
               break;
           }
           if (batch.value) {
               batches.push(batch.value);
           }
       }
       // End of stream
       if (batches.length === 0) {
           break;
       }
       resultTables.push(new Table(batches));
   }
   ```
   
   Specifically, I noticed that we had to do `await arrowReader.open({ 
autoDestroy: false });` otherwise the reader would auto close after reading the 
first record batch.
   
   Does our approach sound sensible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to