Although collecting all configuration keys into a single file is definitely an 
anti-pattern I’m not sure the same is true of package or Maven module level 
aggregation classes marked LP(CONFIG).Somewhat like DFSConfigKeys but geared 
toward our API/release auditing. 

This would seem virtuous for a couple of reasons. Relevant configuration key 
constants for the package or module would be grouped in a well known place for 
users and developers alike. The LP(CONFIG) designation would require developers 
to think about deprecation cycle if contemplating a change, thus providing some 
back pressure against snap decisions. Or, if not then, then at release 
candidate evaluation time, user configuration breaking changes could be caught 
be a release automation tool that diffs LP(CONFIG) annotated classes. Something 
like this would improve the state of configuration key management quite 
dramatically, because currently it’s ad hoc. 


> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault 
> <[email protected]> wrote:
> 
> Thanks for your detailed response, Nick!
> 
>> I think that none of my comments address your intended topic: how do we
> publish our configuration points as an API that can be consumed by user
> applications? (Do I have that correct?)
> 
> This is a good summary, and I appreciate the other thoughts/clarifications
> as well. I also realize this is probably hard to get perfect and any choice
> must be weighed against the effort necessary to change/maintain.
> 
> One example I know is Hadoop/HDFS, and I bet some on this list have much
> more knowledge of that project's history than I do. For HDFS they have
> DFSConfigKeys which in my experience does seem to include most configs. I
> believe they even have unit tests which verify that all configs in the
> various site.xml files are represented in code. In more recent versions
> they have split that class up into smaller groupings, for example
> DfsClientConf and the various inner classes there.
> 
> In a vacuum, from a code design perspective, I'm not commenting on whether
> that's a good or bad pattern. I also don't know of the politics of the
> project or what sorts of pain points they've discovered in that pattern
> over the years. But from *user's perspective*, this is a handy way to
> handle things in my opinion.
> 
> At my company, in general we try to avoid "magic strings" [1] and instead
> always try to use constants. We can and do define our own constants to try
> to mirror some of the "private" magic strings in the hbase client. This is
> better than nothing but even better would be to use hbase-provided
> constants so that we can build more defensive applications, using the
> compiler to verify that the configs we reference still do anything.
> 
> I unfortunately can't speak to the original issues with HConstants that
> turned it into an anti-pattern. What I do notice is there are definitely
> examples in the hbase codebase of duplicated config strings, one of which
> is called out in one of the jiras I linked in my original email. These are
> just bugs waiting to happen in my opinion, either for hbase itself or for
> users which may reference them.
> 
> [1] https://deviq.com/antipatterns/magic-strings
>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]> wrote:
>> 
>> Hi Bryan,
>> 
>> Thanks for bringing this up.
>> 
>> I agree with Duo (and I think we have this settled as project-wide
>> consensus) that HConstants is/was an anti-pattern, that we are actively
>> against adding new fields there, and opportunistically removing fields when
>> we can. Further, the documented meaning of the
>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear in
>> user facing configuration files", so this isn't really appropriate for
>> marking a field that exposes a configuration key to user applications. I
>> will also note that there appears to be two categories of tunable
>> parameters -- configuration points that we expect users to tweak are
>> catalogued and documented in the book [0] and everything else is left to
>> the obscurity of code-grep.
>> 
>> While we are actively squashing use of fields in HConstants, I don't know
>> that we have proposed some alternative to the user community. For my part,
>> when I write and review code that involves configuration keys, I generally
>> implement the key constant string as a private field in an appropriate
>> class, and the unit test coverage for that configuration key replicates the
>> string in the test. My reasoning being that the string is a part of our
>> public API and making a change to the public API should be detected from
>> the unit test. I also have (on occasion) gone out of my way to write about
>> the configuration keys in the package or class-level javadoc.
>> 
>> I think that none of my comments address your intended topic: how do we
>> publish our configuration points as an API that can be consumed by user
>> applications? (Do I have that correct?)
>> 
>> I am of the mind that we don't need/want an API of configurations ; we want
>> a catalogue, i.e., what has been started in our book. Perhaps accompanied
>> by/generated from an authoritative hbase-defaults.xml file. In fact, we
>> already do generate from hbase-default.xml, the result is [1] ; I don't
>> believe it is authoritative.
>> 
>> If we did have an AP thoughI, what would be better than the HConstants
>> approach of key-strings as public fields ? What if we had a
>> ConfigurationBuilder type of class, which had methods tied to configuration
>> keys? I would think that such a globally applicable class would have the
>> same maintenance issues as HConstants. But what if we had some kind of
>> ConfigurationSetter class, perhaps per package, that performed this
>> function? That might be maintainable for us and useful for users.
>> 
>> I'm keen to hear what other ideas are out there, or better, examples and
>> counter-examples from other projects.
>> 
>> Thanks,
>> Nick
>> 
>> [0]: https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>> 
>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>> <[email protected]> wrote:
>> 
>>> Hi devs,
>>> 
>>> As a major user of hbase, my company has thousands of clients deployed
>>> which use the hbase client to connect to a variety of hbase clusters. We
>>> have a common library which handles configuring all clients by setting up
>>> the Configuration object prior to creating a Connection. Our library sets
>>> configurations using the various configs in HConstants, but there are
>> also
>>> a bunch of configs which don't exist in HConstants. For these we have
>>> hardcoded config strings in our client.
>>> 
>>> We're now working on an hbase upgrade and need to go through our client
>>> library and check how the configs may have changed in the new version.
>> This
>>> is relatively easy to do for those HConstants cases -- configs may be
>>> marked @Deprecated which will show up in one's editor, they may be
>> removed
>>> entirely which would show up is a compile error, and otherwise one can
>>> easily click through or bring up the javadoc. For the others that don't
>>> exist in HConstants, we need to go manually search the hbase codebase for
>>> those strings.
>>> 
>>> Without doing this painstaking manual process, we would potentially
>> deploy
>>> the upgraded client with configs which are no longer used or deprecated
>> by
>>> the hbase client. For those using HConstants, this is immediately obvious
>>> because the HConstant field may have been removed. This is a clear
>>> indication of needing to investigate the config. In this case it's
>>> preferred to face the compile failure because it's clearer than having
>>> something silently disappear or change.
>>> 
>>> I opened 3 jiras to move some fields to HConstants, but got some
>> reasonable
>>> pushback from Duo:
>>> 
>>> https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>> https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>> https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>> 
>>> Duo's pushback is that HConstants is an anti-pattern and these configs
>> are
>>> not part of our public API. I can agree that a catch-all constants class
>>> might be an anti-pattern, but would argue that consolidating configs
>> there
>>> is very useful for end-users. I can also potentially agree that exposing
>>> these as part of our public API might limit the flexibility of
>> development
>>> due to compatibility constraints about IA.Public.
>>> 
>>> To me it seems odd to add a configuration, whose whole point is to make
>>> something tuneable, but then bury it in a private class despite having
>> real
>>> implications for how the application runs. If a configuration is not
>> meant
>>> to be tuned, it shouldn't be a configuration at all. Otherwise it should
>> be
>>> exposed for reference.
>>> 
>>> I'm wondering if there is some compromise we can achieve which makes it
>>> easier for end-users to integrate with tunable configs.
>>> 
>>> One can imagine a large project to clean up all of our configs under some
>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making perfect
>>> (needing to migrate all configs) the enemy of good.
>>> 
>>> A better option might be to make those classes which expose configs
>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>>> ConnectionImplementation. That might be the most incremental change we
>>> could make. We could handle this on a case-by-case basis.
>>> 
>>> Does anyone have any thoughts?
>>> 
>> 

Reply via email to