gbronner opened a new issue, #41339:
URL: https://github.com/apache/arrow/issues/41339

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I have some code that is trying to iterate through record batches of fairly 
large parquet files
   
   The code is 
    
     std::shared_ptr<arrow::RecordBatch> *curBatch=&m_curBatch;
      auto status=m_reader->ReadNext(curBatch);
   
   and the stack trace is
   ```
   <signal handler>
   #7  <signal handler called>
   #8  0x00007fb7a284f5cd in parquet::internal::(anonymous 
namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)7> 
>::bytes_for_values(long) const [clone .isra.1197] () from 
/lib64/libparquet.so.1500
   #9  0x00007fb7a28512bd in parquet::internal::(anonymous 
namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)5> 
>::ReserveValues(long) () from /lib64/libparquet.so.1500
   #10 0x00007fb7a27e7b98 in parquet::arrow::(anonymous 
namespace)::LeafReader::LoadBatch(long) () from /lib64/libparquet.so.1500
   #11 0x00007fb7a27f57d8 in parquet::arrow::ColumnReaderImpl::NextBatch(long, 
std::shared_ptr<arrow::ChunkedArray>*) () from /lib64/libparquet.so.1500
   #12 0x00007fb7a27eeec0 in 
arrow::Result<arrow::Iterator<std::shared_ptr<arrow::RecordBatch> > > 
arrow::Iterator<arrow::Iterator<std::shared_ptr<arrow::RecordBatch> > 
>::Next<arrow::FunctionIterator<parquet::arrow::(anonymous 
namespace)::FileReaderImpl::GetRecordBatchReader(std::vector<int, 
std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, 
std::unique_ptr<arrow::RecordBatchReader, 
std::default_delete<arrow::RecordBatchReader> >*)::{lambda()#1}, 
arrow::Iterator<std::shared_ptr<arrow::RecordBatch> > > >(void*) () from 
/lib64/libparquet.so.1500
   #13 0x00007fb7a27fa3ea in 
arrow::FlattenIterator<std::shared_ptr<arrow::RecordBatch> >::Next() () from 
/lib64/libparquet.so.1500
   #14 0x00007fb7a27fa4aa in 
arrow::FlattenIterator<std::shared_ptr<arrow::RecordBatch> >::Next() () from 
/lib64/libparquet.so.1500
   #15 0x00007fb7a27fa581 in arrow::Result<std::shared_ptr<arrow::RecordBatch> 
> arrow::Iterator<std::shared_ptr<arrow::RecordBatch> 
>::Next<arrow::FlattenIterator<std::shared_ptr<arrow::RecordBatch> > >(void*) 
() from /lib64/libparquet.so.1500
   #16 0x00007fb7a27e7ac2 in parquet::arrow::(anonymous 
namespace)::RowGroupRecordBatchReader::ReadNext(std::shared_ptr<arrow::RecordBatch>*)
 () from /lib64/libparquet.so.1500
   
   
   ```
   
   I'm a bit at a loss for why this would happen.  
   I've also seen some references to 
   Invalid: Buffer #1 too small in array of type int64 and length 3: expected 
at least 24 byte(s), got 0  when working with extremely wide parquet files.
   
   The Parquet file is fine -- I can read it with ReadTable, pyarrow, etc.  It 
even works if the batch size is sufficiently large to read it in one batch.
   
   
   Any ideas as to why it would run out of buffers even if I'm only reading 
batch sizes of 3? 
   
   
   
   
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to