chenwyi2 opened a new issue, #16758:
URL: https://github.com/apache/iceberg/issues/16758
### Apache Iceberg version
1.10.1
### Query engine
Starrocks
### Please describe the bug 🐞
## Summary
`ScanTaskIterable` (REST scan planning, Iceberg 1.10.1) can leave background
`PlanTaskWorker` tasks running after the scan is abandoned. This happens when:
1. The consumer stops iterating (query timeout / cancellation / client
disconnect in downstream engines), and
2. Only the outer `CloseableIterable` is closed, or nothing is closed at all.
In our production integration (StarRocks FE + Iceberg REST catalog), this
leads to all planning pool threads blocked in `offerWithTimeout()` for hours,
while new scans block in `hasNext()` with empty queues.
## Affected version / component
- Version: Apache Iceberg **1.10.1**
- Class: `org.apache.iceberg.rest.ScanTaskIterable`
- Related: `org.apache.iceberg.rest.RESTTableScan`
## Problem 1: `ScanTaskIterable.close()` is a no-op
`RESTTableScan` returns:
```java
return CloseableIterable.whenComplete(
new ScanTaskIterable(...),
this::cancelPlan);
```
whenComplete calls iterable.close() in a finally block after the wrapped
iterable is closed.
However, in 1.10.1:
```java
@Override
public void close() throws IOException {} // ScanTaskIterable.close() is
empty
```
Shutdown is only set in:
```java
// ScanTasksIterator.close()
shutdown.set(true);
taskQueue.clear();
planTasks.clear();
```
If the engine:
closes only the outer iterable, or
never closes the iterator because the query timed out / was cancelled during
hasNext(),
then:
PlanTaskWorker threads keep running,
producers may block in offerWithTimeout() when taskQueue (capacity 1000) is
full,
consumers are gone, so the queue never drains,
workers retry forever because shutdown remains false.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]