flyrain commented on PR #8132: URL: https://github.com/apache/iceberg/pull/8132#issuecomment-1650790503
IIUC, this is due to missing of resource cleanup for `iterable`(here are class `PositionStreamDeleteFilter` and `PositionStreamDeleteMarker`). We add the `deletePosIterator` to close group [here](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/deletes/Deletes.java#L233-L233), but when we use the iterable, we return its iterator by doing this: ``` return deleteFilter.filter(open(task, requiredSchema, idToConstant)).iterator(); ``` There are more places we need to change. For example, we didn't close the iterable [here](https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java#L96-L96), and this [place](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L340-L340). Other than your PR, one of options is to close the iterable explicitly in the reader. Each reader extends the `CloseableGroup`, and every time there are a new iterable. We add it to the group by doing this: ``` addCloseable(iterable); ``` But I still does't feel comfortable about it. A developer can easily get an iterator from an iterable, and forget to close the iterable. I'm also not sure what's the best way for this. Open to suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
