Hi HBase dev community,

I'm Michael Miklavcic, PMC/committer on Apache Metron and we're heavy users
of Apache HBase. We're currently going through a major Hadoop stack upgrade
that includes an upgrade from HBase 1.1.2 to 2.0.2 and would appreciate
some guidance on the new connection management guarantees. The biggest risk
and code change we see right now is the old HTableInterface client API
deprecations where HBase connections are no longer managed under the hood
by HTable. The new API suggests opening a long-running connection and
opening/closing Tables retrieved from that connection in an ad-hoc manner.
We currently run long-lived HBase connections in a Storm topology,
generally sharing those original tables on a per-thread basis. We do not go
to any extraordinary lengths to close any of our open HBase tables - they
are left open for the duration of the topology. There are some
close/cleanup hooks, but I don't think they are consistently applied
throughout the architecture. In the new API, it's unclear to me what the
connection retry/fail semantics will look like for instances where a Table
is created from a connection that is closed or has gone stale.

   1. Is there any logic built into the underlying table to refresh the
   connections, or is it entirely up to the client to fail, create a new
   connection, create a new table reference, and retry the operation?
   2. What exception/retry semantics should we expect when performing a
   Table operation if the connection times out, other than perhaps an
   IOException?
   3. How is a Table coupled to a connection under the hood in the new API?
   4. We're looking to minimize the overall architectural impact of our
   upgrade. I took a go at it here (
   
https://github.com/apache/metron/pull/1483/files#diff-d2799e20727b64e65da6f6ed2e95a2f0R56)
   by expanding on a "TableProvider" abstraction we leverage for our HBase
   interactions. I've isolated the connection management to this class on a
   per-thread basis for running in a long-running Storm topology.

Per #4, I'm wondering if this approach is reasonable, or whether we need to
seriously consider completely rewriting how we manage our interactions with
HBase, including a more robust connection pooling solution. We want to
emphasize the smallest change possible, considering the overall risk of
this major upgrade.

Best,
Mike Miklavcic
PMC Apache Metron

Reply via email to