YoungRX opened a new issue, #15264:
URL: https://github.com/apache/arrow/issues/15264

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The error information is "Could not read from readahead_queue". And the 
details are as follows:
   
   Firstly, I set use_threads to false, batch_readahead to 0, and set 
batch_size, dataset_schema, project, and filter in setScanOptions();.
   ```
       // project & filter
       setScanOptions();
   ```
   Secondly, I use ParquetFileFormat::ScanBatchesAsync function to scan a 
ParquetFileFragment.
   
   Then I follow the code below to get std::shared_ptr<arrow::RecordBatch>.
   ```
       // parquetFileFormat is an object of ParquetFileFormat class
       // parquetFileFragment is an object of FileFragment class
       auto recordBatchIterator = arrow::MakeGeneratorIterator(
                                   
std::move(parquetFileFormat->ScanBatchesAsync(parquetScanOptions, 
parquetFileFragment).ValueOrDie()));
       std::shared_ptr<arrow::RecordBatch> recordBatch;
       auto recordBatch_res = recordBatchIterator.Next();
       if (recordBatch_res.ok())
           recordBatch= recordBatch_res.ValueOrDie();
   ```
   
   Finally, during debugging, the error message in recordBatch_res is "Could 
not read from readahead_queue".
   I found this error message in the SerialReadaheadGenerator class, probably 
because of the data structure util::SpscQueue<std::shared_ptr<Future<T>>> 
readahead_queue_.
   ```
   template <typename T>
   using SpscQueue = arrow_vendored::folly::ProducerConsumerQueue<T>;
   ```
   **The size of the corresponding ProducerConsumerQueue class must be greater 
than or equal to 2, which is described in the code comments. Therefore, if I 
set batch_readahead to 0, the size of ProducerConsumerQueue is 1, and an error 
occurs when I read the parquet file.**
   
   > src\arrow\util\async_generator.h  583
   > src\arrow\vendored\ProducerConsumerQueue.h  75
   
   **My requirement is not to use multithreading, but when I set use_threads to 
false, if batch_readahead * batch_size > the number of rows in a row_group 
which to be read, it will read multiple row_groups at the same time. My 
existing code doesn't support multithreading, so it reads errors.**
   
   **Then I want to set batch_readahead to 0, and the above error occurs, which 
may be a bug. So, can you fix this bug or help me to avoid reading multiple 
row_groups at the same time?**
   
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to