mattaubury opened a new issue, #39862:
URL: https://github.com/apache/arrow/issues/39862
### Describe the bug, including details regarding any error messages,
version, and platform.
In Arrow-15.0.0 when using a threaded dataset scan I sometimes see the
following crash at exit:
```
#1 0x00007ffff691e351 in arrow::Status
arrow::internal::Executor::Spawn<arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::shared_ptr<arrow::FutureImpl>
const&, arrow::FutureImpl::CallbackRecord&&,
bool)::{lambda()#1}>(arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::shared_ptr<arrow::FutureImpl>
const&, arrow::FutureImpl::CallbackRecord&&, bool)::{lambda()#1}&&) () from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#2 0x00007ffff691e6ee in
arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::shared_ptr<arrow::FutureImpl>
const&, arrow::FutureImpl::CallbackRecord&&, bool) ()
from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#3 0x00007ffff691e99d in
arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) ()
from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#4 0x00007ffff68c7454 in void
arrow::Future<arrow::internal::Empty>::MarkFinished<arrow::internal::Empty,
void>(arrow::Status) ()
from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#5 0x00007ffff691bf1e in arrow::internal::FnOnce<void (arrow::FutureImpl
const&)>::FnImpl<arrow::Future<arrow::internal::Empty>::WrapStatusyOnComplete::Callback<arrow::AllComplete(std::vector<arrow::Future<arrow::internal::Empty>,
std::allocator<arrow::Future<arrow::internal::Empty> > >
const&)::{lambda(arrow::Status const&)#1}> >::invoke(arrow::FutureImpl const&)
() from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#6 0x00007ffff691e65c in
arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::shared_ptr<arrow::FutureImpl>
const&, arrow::FutureImpl::CallbackRecord&&, bool) ()
from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#7 0x00007ffff691e99d in
arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) ()
from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#8 0x00007ffff68c0506 in arrow::internal::FnOnce<void
()>::FnImpl<std::_Bind<arrow::detail::ContinueFuture
(arrow::Future<std::shared_ptr<arrow::Buffer> >,
arrow::io::RandomAccessFile::ReadAsync(arrow::io::IOContext const&, long,
long)::{lambda()#1})> >::invoke() ()
from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#9 0x00007ffff694d6c9 in
std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::{lambda()#1}>
> >::_M_run() () from
/jump/software/rhel8/apache-arrow-15.0.0-cxx20-gcc10/lib64/libarrow.so.1500
#10 0x00007ffff5190640 in std::execute_native_thread_routine (__p=0x777df0)
at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#11 0x00007ffff17021cf in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff4794dd3 in clone () from /lib64/libc.so.6
```
The crash is non-deterministic, happens around 50% of the time on the
machine I'm testing on. This problem only appears when the program terminates
immediately after the end of the scan; my guess would be that the threads are
not being cancelled/joined correctly and so are still running when the program
terminates.
To create the input data:
```
import pyarrow.parquet as pq
import pyarrow as pa
pq.write_table(pa.Table.from_pandas(pd.read_parquet("http://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet")),
"taxi.parquet")
```
For some reason, using the downloaded Parquet directly does NOT show the
problem.
Then to produce the crash:
```
#include <arrow/dataset/api.h>
#include <arrow/filesystem/api.h>
int
main ()
{
const auto format = std::make_shared<arrow::dataset::ParquetFileFormat>
();
const auto options = arrow::dataset::FileSystemFactoryOptions {};
const std::shared_ptr<arrow::fs::FileSystem> filesystem =
std::make_shared<arrow::fs::LocalFileSystem> ();
std::vector<std::string> object_ids { "taxi.parquet" };
auto factory = arrow::dataset::FileSystemDatasetFactory::Make (
filesystem, std::move (object_ids), format, options)
.ValueOrDie ();
const auto full_dataset = factory->Finish ().ValueOrDie ();
auto builder = full_dataset->NewScan ().ValueOrDie ();
(void)builder->UseThreads (true);
(void)builder->Project ({ "passenger_count", "trip_distance" });
auto scanner = builder->Finish ().ValueOrDie ();
(void)scanner->Head (10);
}
```
I compiled this with:
```
g++ -std=c++20 arrow_bug.cpp $(pkg-config arrow --cflags --libs)
$(pkg-config arrow-dataset --cflags --libs)
```
I've also seen this crash with 14.0.1, but can't reproduce on 12.0.0, so I
imagine something changed between them.
### Component(s)
C++, Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]