westonpace opened a new issue, #36778:
URL: https://github.com/apache/arrow/issues/36778

   ### Describe the enhancement requested
   
   We do actually already have an asynchronous version with the method 
`GetRecordBatchGenerator`.  However, that method does not respect the batch 
size property and it is not possible to read less than 1 entire row group at a 
time.  This makes it difficult to read large parquet files and makes the 
scanner's memory usage very dependent on a parquet file's row group size.
   
   This PR is requesting a new method which is a closer analogue to the 
existing ReadRowGroup/ReadRowGroups methods.  Once the scanner moves over to 
this new method then I think we can deprecate `GetRecordBatchGenerator`.  I 
hesitate to replace it immediately as I don't want to introduce any breaking 
changes into the existing scanner path.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to