flyrain commented on PR #8132:
URL: https://github.com/apache/iceberg/pull/8132#issuecomment-1650790503

   IIUC, this is due to missing of resource cleanup for `iterable`(here are 
class `PositionStreamDeleteFilter` and `PositionStreamDeleteMarker`). We add 
the `deletePosIterator` to close group 
[here](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/deletes/Deletes.java#L233-L233),
 but when we use the iterable, we return its iterator by doing this:
   ```
       return deleteFilter.filter(open(task, requiredSchema, 
idToConstant)).iterator();
   ```
   
   There are more places we need to change. For example, we didn't close the 
iterable 
[here](https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java#L96-L96),
 and this 
[place](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L340-L340).
   
   Other than your PR, one of options is to close the iterable explicitly in 
the reader. Each reader extends the `CloseableGroup`, and every time there are 
a new iterable. We add it to the group by doing this:
   ```
   addCloseable(iterable);
   ```
   But I still does't feel comfortable about it. A developer can easily get an 
iterator from an iterable, and forget to close the iterable. I'm also not sure 
what's the best way for this. Open to suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to