Hi devs,

As a major user of hbase, my company has thousands of clients deployed
which use the hbase client to connect to a variety of hbase clusters. We
have a common library which handles configuring all clients by setting up
the Configuration object prior to creating a Connection. Our library sets
configurations using the various configs in HConstants, but there are also
a bunch of configs which don't exist in HConstants. For these we have
hardcoded config strings in our client.

We're now working on an hbase upgrade and need to go through our client
library and check how the configs may have changed in the new version. This
is relatively easy to do for those HConstants cases -- configs may be
marked @Deprecated which will show up in one's editor, they may be removed
entirely which would show up is a compile error, and otherwise one can
easily click through or bring up the javadoc. For the others that don't
exist in HConstants, we need to go manually search the hbase codebase for
those strings.

Without doing this painstaking manual process, we would potentially deploy
the upgraded client with configs which are no longer used or deprecated by
the hbase client. For those using HConstants, this is immediately obvious
because the HConstant field may have been removed. This is a clear
indication of needing to investigate the config. In this case it's
preferred to face the compile failure because it's clearer than having
something silently disappear or change.

I opened 3 jiras to move some fields to HConstants, but got some reasonable
pushback from Duo:

https://issues.apache.org/jira/browse/HBASE-26845
https://issues.apache.org/jira/browse/HBASE-26846
https://issues.apache.org/jira/browse/HBASE-26847

Duo's pushback is that HConstants is an anti-pattern and these configs are
not part of our public API. I can agree that a catch-all constants class
might be an anti-pattern, but would argue that consolidating configs there
is very useful for end-users.  I can also potentially agree that exposing
these as part of our public API might limit the flexibility of development
due to compatibility constraints about IA.Public.

To me it seems odd to add a configuration, whose whole point is to make
something tuneable, but then bury it in a private class despite having real
implications for how the application runs. If a configuration is not meant
to be tuned, it shouldn't be a configuration at all. Otherwise it should be
exposed for reference.

I'm wondering if there is some compromise we can achieve which makes it
easier for end-users to integrate with tunable configs.

One can imagine a large project to clean up all of our configs under some
new class with maybe IA.LimitedPrivate(CONFIG), but I fear making perfect
(needing to migrate all configs) the enemy of good.

A better option might be to make those classes which expose configs
LimitedPrivate(CONFIG) -- for example AsyncProcess and
ConnectionImplementation. That might be the most incremental change we
could make. We could handle this on a case-by-case basis.

Does anyone have any thoughts?

Reply via email to