Hi HBase dev community, I'm Michael Miklavcic, PMC/committer on Apache Metron and we're heavy users of Apache HBase. We're currently going through a major Hadoop stack upgrade that includes an upgrade from HBase 1.1.2 to 2.0.2 and would appreciate some guidance on the new connection management guarantees. The biggest risk and code change we see right now is the old HTableInterface client API deprecations where HBase connections are no longer managed under the hood by HTable. The new API suggests opening a long-running connection and opening/closing Tables retrieved from that connection in an ad-hoc manner. We currently run long-lived HBase connections in a Storm topology, generally sharing those original tables on a per-thread basis. We do not go to any extraordinary lengths to close any of our open HBase tables - they are left open for the duration of the topology. There are some close/cleanup hooks, but I don't think they are consistently applied throughout the architecture. In the new API, it's unclear to me what the connection retry/fail semantics will look like for instances where a Table is created from a connection that is closed or has gone stale.
1. Is there any logic built into the underlying table to refresh the connections, or is it entirely up to the client to fail, create a new connection, create a new table reference, and retry the operation? 2. What exception/retry semantics should we expect when performing a Table operation if the connection times out, other than perhaps an IOException? 3. How is a Table coupled to a connection under the hood in the new API? 4. We're looking to minimize the overall architectural impact of our upgrade. I took a go at it here ( https://github.com/apache/metron/pull/1483/files#diff-d2799e20727b64e65da6f6ed2e95a2f0R56) by expanding on a "TableProvider" abstraction we leverage for our HBase interactions. I've isolated the connection management to this class on a per-thread basis for running in a long-running Storm topology. Per #4, I'm wondering if this approach is reasonable, or whether we need to seriously consider completely rewriting how we manage our interactions with HBase, including a more robust connection pooling solution. We want to emphasize the smallest change possible, considering the overall risk of this major upgrade. Best, Mike Miklavcic PMC Apache Metron
