Github user dhutchis commented on the issue:
https://github.com/apache/accumulo/pull/159
Bringing the discussion on exception handling back to the top level, and
adding more thoughts.
## Exceptions and Closing
Suppose you're an iterator. You call `next()` or `seek()` on your source
and it throws an IOException. I would treat this the same as if `hasTop()`
returned false; no more data is available from your source. The iterator stack
will not necessarily be torn down. The bottom of the iterator stack might seek
to the next range, for example (not sure if this actually happens).
Accumulo should *guarantee* that `close()` will be called before an
iterator is torn down, say via try-finally, except in extreme cases like OOM.
I see two approaches to doing this:
1. Specify the contract that every iterator *must* `close()` its source.
This strategy works if every iterator, system and user-defined, follows the
contract. Existing iterators would need to be modified to do this, so it's a
breaking change.
2. Retain the list of `IterInfo`s from when Accumulo sets up the iterator
stack (see `IteratorUtil.loadIterators()`). Before Accumulo tears down the
stack, for any reason, call `close()` on each iterator individually, say
starting from the root RFile/InMemoryMap iterators and working down (top-down
vs. bottom-up order may not matter). This strategy would not break existing
iterators; an iterator that does not call `close()` on its source is fine since
Accumulo would call it. It also provides better iterator isolation.
## Idempotent
I think `close()` *must* be idempotent; calling `close()` more than once
should have the same effect as calling `close()` once. We should document the
requirement. This would also suggest changing `AutoCloseable` to `Closeable`.
If we don't require idempotency, then option 2 above is not viable and we
should carefully engineer the implementation for *exactly-once* semantics.
Seems difficult. Also, can anyone think of a use case that requires
exactly-once semantics?
## Less certain ideas
Because discussion is good =)
### "Reason" for Closing
It would be awesome if the iterator knew *why* Accumulo is closing it. This
doesn't mesh well with the `AutoCloseable` interface. Otherwise I would
suggest passing a `Reason` interface parameter to the `close()` method.
Reasons include "the client stopped requesting tuples" or "switching source" or
"starting migration" or "tuple returned outside of seek fence for this tablet"
or "out of fairness to other concurrent scans" or "client cancelled scan".
Something like
```
@Override
default void close() {
close(Reason.UNKNOWN);
}
default void close(Reason reason) {
// default implementation, do nothing
}
```
It's fine if we don't do the `Reason`. It would make try-with-resources
awkward. Then again, how often (if ever) would iterators be constructed in
try-with-resources blocks?
I'm also not sure what use cases need this. It seems helpful to provide the
iterators with this information, but maybe it's not necessary.
### Put `close()` in `finalize()` for safety
Yes, we know the JVM does not guarantee `finalize()` would ever be called.
But it would be a nice "safety" against bugs in Accumulo that cause Accumulo to
forgo calling `close()` as it normally must. In SKVI.java:
```
@Override
protected void finalize() throws Throwable {
super.finalize();
close();
}
```
Obviously we cannot do this if we don't require `close()` idempotency.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---