mapleFU commented on issue #37487:
URL: https://github.com/apache/arrow/issues/37487#issuecomment-1701401892
I've found out the reason. But I'm too tired today, I'd like to fix it
tomorrow.
Change `BufferReader::ReadAsync`, and set `ARROW_IO_THREADS == 1` would
reproduce the problem.
```c++
Future<std::shared_ptr<Buffer>> BufferReader::ReadAsync(const IOContext& ctx,
int64_t position,
int64_t nbytes) {
return DeferNotOk(ctx.executor()->Submit([this, position, nbytes]() {
return DoReadAt(position, nbytes);
}));
}
```
The IO Thread would wait for it:
```c++
Future<std::optional<int64_t>> ParquetFileFormat::CountRows(
const std::shared_ptr<FileFragment>& file, compute::Expression predicate,
const std::shared_ptr<ScanOptions>& options) {
auto parquet_file = checked_pointer_cast<ParquetFileFragment>(file);
if (parquet_file->metadata()) {
ARROW_ASSIGN_OR_RAISE(auto maybe_count,
parquet_file->TryCountRows(std::move(predicate)));
return Future<std::optional<int64_t>>::MakeFinished(maybe_count);
} else {
return DeferNotOk(options->io_context.executor()->Submit(
[parquet_file, predicate]() -> Result<std::optional<int64_t>> {
RETURN_NOT_OK(parquet_file->EnsureCompleteMetadata()); // <- here,
it submit a task, and will wait for it in parquet_file->EnsureCompleteMetadata,
causing deadlock
return parquet_file->TryCountRows(predicate);
}));
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]