gbronner opened a new issue, #41339:
URL: https://github.com/apache/arrow/issues/41339
### Describe the bug, including details regarding any error messages,
version, and platform.
I have some code that is trying to iterate through record batches of fairly
large parquet files
The code is
std::shared_ptr<arrow::RecordBatch> *curBatch=&m_curBatch;
auto status=m_reader->ReadNext(curBatch);
and the stack trace is
```
<signal handler>
#7 <signal handler called>
#8 0x00007fb7a284f5cd in parquet::internal::(anonymous
namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)7>
>::bytes_for_values(long) const [clone .isra.1197] () from
/lib64/libparquet.so.1500
#9 0x00007fb7a28512bd in parquet::internal::(anonymous
namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)5>
>::ReserveValues(long) () from /lib64/libparquet.so.1500
#10 0x00007fb7a27e7b98 in parquet::arrow::(anonymous
namespace)::LeafReader::LoadBatch(long) () from /lib64/libparquet.so.1500
#11 0x00007fb7a27f57d8 in parquet::arrow::ColumnReaderImpl::NextBatch(long,
std::shared_ptr<arrow::ChunkedArray>*) () from /lib64/libparquet.so.1500
#12 0x00007fb7a27eeec0 in
arrow::Result<arrow::Iterator<std::shared_ptr<arrow::RecordBatch> > >
arrow::Iterator<arrow::Iterator<std::shared_ptr<arrow::RecordBatch> >
>::Next<arrow::FunctionIterator<parquet::arrow::(anonymous
namespace)::FileReaderImpl::GetRecordBatchReader(std::vector<int,
std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&,
std::unique_ptr<arrow::RecordBatchReader,
std::default_delete<arrow::RecordBatchReader> >*)::{lambda()#1},
arrow::Iterator<std::shared_ptr<arrow::RecordBatch> > > >(void*) () from
/lib64/libparquet.so.1500
#13 0x00007fb7a27fa3ea in
arrow::FlattenIterator<std::shared_ptr<arrow::RecordBatch> >::Next() () from
/lib64/libparquet.so.1500
#14 0x00007fb7a27fa4aa in
arrow::FlattenIterator<std::shared_ptr<arrow::RecordBatch> >::Next() () from
/lib64/libparquet.so.1500
#15 0x00007fb7a27fa581 in arrow::Result<std::shared_ptr<arrow::RecordBatch>
> arrow::Iterator<std::shared_ptr<arrow::RecordBatch>
>::Next<arrow::FlattenIterator<std::shared_ptr<arrow::RecordBatch> > >(void*)
() from /lib64/libparquet.so.1500
#16 0x00007fb7a27e7ac2 in parquet::arrow::(anonymous
namespace)::RowGroupRecordBatchReader::ReadNext(std::shared_ptr<arrow::RecordBatch>*)
() from /lib64/libparquet.so.1500
```
I'm a bit at a loss for why this would happen.
I've also seen some references to
Invalid: Buffer #1 too small in array of type int64 and length 3: expected
at least 24 byte(s), got 0 when working with extremely wide parquet files.
The Parquet file is fine -- I can read it with ReadTable, pyarrow, etc. It
even works if the batch size is sufficiently large to read it in one batch.
Any ideas as to why it would run out of buffers even if I'm only reading
batch sizes of 3?
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]