Github user dhutchis commented on the issue:

    https://github.com/apache/accumulo/pull/159
  
    Bringing the discussion on exception handling back to the top level, and 
adding more thoughts.
    
    ## Exceptions and Closing
    
    Suppose you're an iterator.  You call `next()` or `seek()` on your source 
and it throws an IOException.  I would treat this the same as if `hasTop()` 
returned false; no more data is available from your source.  The iterator stack 
will not necessarily be torn down. The bottom of the iterator stack might seek 
to the next range, for example (not sure if this actually happens).
    
    Accumulo should *guarantee* that `close()` will be called before an 
iterator is torn down, say via try-finally, except in extreme cases like OOM.  
I see two approaches to doing this:
    
    1. Specify the contract that every iterator *must* `close()` its source.  
This strategy works if every iterator, system and user-defined, follows the 
contract.  Existing iterators would need to be modified to do this, so it's a 
breaking change.
    
    2. Retain the list of `IterInfo`s from when Accumulo sets up the iterator 
stack (see `IteratorUtil.loadIterators()`). Before Accumulo tears down the 
stack, for any reason, call `close()` on each iterator individually, say 
starting from the root RFile/InMemoryMap iterators and working down (top-down 
vs. bottom-up order may not matter). This strategy would not break existing 
iterators; an iterator that does not call `close()` on its source is fine since 
Accumulo would call it. It also provides better iterator isolation.
    
    ## Idempotent
    
    I think `close()` *must* be idempotent; calling `close()` more than once 
should have the same effect as calling `close()` once. We should document the 
requirement. This would also suggest changing `AutoCloseable` to `Closeable`.
    
    If we don't require idempotency, then option 2 above is not viable and we 
should carefully engineer the implementation for *exactly-once* semantics. 
Seems difficult. Also, can anyone think of a use case that requires 
exactly-once semantics?
    
    ## Less certain ideas
    
    Because discussion is good =)
    
    ### "Reason" for Closing
    
    It would be awesome if the iterator knew *why* Accumulo is closing it. This 
doesn't mesh well with the `AutoCloseable` interface.  Otherwise I would 
suggest passing a `Reason` interface parameter to the `close()` method.  
Reasons include "the client stopped requesting tuples" or "switching source" or 
"starting migration" or "tuple returned outside of seek fence for this tablet" 
or "out of fairness to other concurrent scans" or "client cancelled scan". 
Something like
    
    ```
    @Override
    default void close() {
      close(Reason.UNKNOWN);
    }
    
    default void close(Reason reason) {
      // default implementation, do nothing
    }
    ```
    
    It's fine if we don't do the `Reason`. It would make try-with-resources 
awkward. Then again, how often (if ever) would iterators be constructed in 
try-with-resources blocks?
    
    I'm also not sure what use cases need this. It seems helpful to provide the 
iterators with this information, but maybe it's not necessary.
    
    ### Put `close()` in `finalize()` for safety
    
    Yes, we know the JVM does not guarantee `finalize()` would ever be called. 
But it would be a nice "safety" against bugs in Accumulo that cause Accumulo to 
forgo calling `close()` as it normally must. In SKVI.java:
    
    ```
    @Override
    protected void finalize() throws Throwable {
      super.finalize();
      close();
    }
    ```
    
    Obviously we cannot do this if we don't require `close()` idempotency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to