On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <[email protected]> wrote:
> Although collecting all configuration keys into a single file is > definitely an anti-pattern I’m not sure the same is true of package or > Maven module level aggregation classes marked LP(CONFIG).Somewhat like > DFSConfigKeys but geared toward our API/release auditing. > > This would seem virtuous for a couple of reasons. Relevant configuration > key constants for the package or module would be grouped in a well known > place for users and developers alike. The LP(CONFIG) designation would > require developers to think about deprecation cycle if contemplating a > change, thus providing some back pressure against snap decisions. Or, if > not then, then at release candidate evaluation time, user configuration > breaking changes could be caught be a release automation tool that diffs > LP(CONFIG) annotated classes. Something like this would improve the state > of configuration key management quite dramatically, because currently it’s > ad hoc. > I am supportive of trying such an effort. We'll need to tweak the meaning of LP(CONFIG) as we define it currently, but that can be done. I don't know what our current tools do or assume regarding this annotation. I think there is something custom happening in the compatibility reports that we generate as part of each RC. > On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault > <[email protected]> > wrote: > > > > Thanks for your detailed response, Nick! > > > >> I think that none of my comments address your intended topic: how do we > > publish our configuration points as an API that can be consumed by user > > applications? (Do I have that correct?) > > > > This is a good summary, and I appreciate the other > thoughts/clarifications > > as well. I also realize this is probably hard to get perfect and any > choice > > must be weighed against the effort necessary to change/maintain. > > > > One example I know is Hadoop/HDFS, and I bet some on this list have much > > more knowledge of that project's history than I do. For HDFS they have > > DFSConfigKeys which in my experience does seem to include most configs. I > > believe they even have unit tests which verify that all configs in the > > various site.xml files are represented in code. In more recent versions > > they have split that class up into smaller groupings, for example > > DfsClientConf and the various inner classes there. > > > > In a vacuum, from a code design perspective, I'm not commenting on > whether > > that's a good or bad pattern. I also don't know of the politics of the > > project or what sorts of pain points they've discovered in that pattern > > over the years. But from *user's perspective*, this is a handy way to > > handle things in my opinion. > > > > At my company, in general we try to avoid "magic strings" [1] and instead > > always try to use constants. We can and do define our own constants to > try > > to mirror some of the "private" magic strings in the hbase client. This > is > > better than nothing but even better would be to use hbase-provided > > constants so that we can build more defensive applications, using the > > compiler to verify that the configs we reference still do anything. > > > > I unfortunately can't speak to the original issues with HConstants that > > turned it into an anti-pattern. What I do notice is there are definitely > > examples in the hbase codebase of duplicated config strings, one of which > > is called out in one of the jiras I linked in my original email. These > are > > just bugs waiting to happen in my opinion, either for hbase itself or for > > users which may reference them. > > > > [1] https://deviq.com/antipatterns/magic-strings > >> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]> > wrote: > >> > >> Hi Bryan, > >> > >> Thanks for bringing this up. > >> > >> I agree with Duo (and I think we have this settled as project-wide > >> consensus) that HConstants is/was an anti-pattern, that we are actively > >> against adding new fields there, and opportunistically removing fields > when > >> we can. Further, the documented meaning of the > >> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear > in > >> user facing configuration files", so this isn't really appropriate for > >> marking a field that exposes a configuration key to user applications. I > >> will also note that there appears to be two categories of tunable > >> parameters -- configuration points that we expect users to tweak are > >> catalogued and documented in the book [0] and everything else is left to > >> the obscurity of code-grep. > >> > >> While we are actively squashing use of fields in HConstants, I don't > know > >> that we have proposed some alternative to the user community. For my > part, > >> when I write and review code that involves configuration keys, I > generally > >> implement the key constant string as a private field in an appropriate > >> class, and the unit test coverage for that configuration key replicates > the > >> string in the test. My reasoning being that the string is a part of our > >> public API and making a change to the public API should be detected from > >> the unit test. I also have (on occasion) gone out of my way to write > about > >> the configuration keys in the package or class-level javadoc. > >> > >> I think that none of my comments address your intended topic: how do we > >> publish our configuration points as an API that can be consumed by user > >> applications? (Do I have that correct?) > >> > >> I am of the mind that we don't need/want an API of configurations ; we > want > >> a catalogue, i.e., what has been started in our book. Perhaps > accompanied > >> by/generated from an authoritative hbase-defaults.xml file. In fact, we > >> already do generate from hbase-default.xml, the result is [1] ; I don't > >> believe it is authoritative. > >> > >> If we did have an AP thoughI, what would be better than the HConstants > >> approach of key-strings as public fields ? What if we had a > >> ConfigurationBuilder type of class, which had methods tied to > configuration > >> keys? I would think that such a globally applicable class would have the > >> same maintenance issues as HConstants. But what if we had some kind of > >> ConfigurationSetter class, perhaps per package, that performed this > >> function? That might be maintainable for us and useful for users. > >> > >> I'm keen to hear what other ideas are out there, or better, examples and > >> counter-examples from other projects. > >> > >> Thanks, > >> Nick > >> > >> [0]: https://hbase.apache.org/book.html#important_configurations > >> <https://hbase.apache.org/book.html#important_configurations> > >> [1]: https://hbase.apache.org/book.html#hbase_default_configurations > >> <https://hbase.apache.org/book.html#hbase_default_configurations> > >> > >> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault > >> <[email protected]> wrote: > >> > >>> Hi devs, > >>> > >>> As a major user of hbase, my company has thousands of clients deployed > >>> which use the hbase client to connect to a variety of hbase clusters. > We > >>> have a common library which handles configuring all clients by setting > up > >>> the Configuration object prior to creating a Connection. Our library > sets > >>> configurations using the various configs in HConstants, but there are > >> also > >>> a bunch of configs which don't exist in HConstants. For these we have > >>> hardcoded config strings in our client. > >>> > >>> We're now working on an hbase upgrade and need to go through our client > >>> library and check how the configs may have changed in the new version. > >> This > >>> is relatively easy to do for those HConstants cases -- configs may be > >>> marked @Deprecated which will show up in one's editor, they may be > >> removed > >>> entirely which would show up is a compile error, and otherwise one can > >>> easily click through or bring up the javadoc. For the others that don't > >>> exist in HConstants, we need to go manually search the hbase codebase > for > >>> those strings. > >>> > >>> Without doing this painstaking manual process, we would potentially > >> deploy > >>> the upgraded client with configs which are no longer used or deprecated > >> by > >>> the hbase client. For those using HConstants, this is immediately > obvious > >>> because the HConstant field may have been removed. This is a clear > >>> indication of needing to investigate the config. In this case it's > >>> preferred to face the compile failure because it's clearer than having > >>> something silently disappear or change. > >>> > >>> I opened 3 jiras to move some fields to HConstants, but got some > >> reasonable > >>> pushback from Duo: > >>> > >>> https://issues.apache.org/jira/browse/HBASE-26845 > >> <https://issues.apache.org/jira/browse/HBASE-26845> > >>> https://issues.apache.org/jira/browse/HBASE-26846 > >> <https://issues.apache.org/jira/browse/HBASE-26846> > >>> https://issues.apache.org/jira/browse/HBASE-26847 > >> <https://issues.apache.org/jira/browse/HBASE-26847> > >>> > >>> Duo's pushback is that HConstants is an anti-pattern and these configs > >> are > >>> not part of our public API. I can agree that a catch-all constants > class > >>> might be an anti-pattern, but would argue that consolidating configs > >> there > >>> is very useful for end-users. I can also potentially agree that > exposing > >>> these as part of our public API might limit the flexibility of > >> development > >>> due to compatibility constraints about IA.Public. > >>> > >>> To me it seems odd to add a configuration, whose whole point is to make > >>> something tuneable, but then bury it in a private class despite having > >> real > >>> implications for how the application runs. If a configuration is not > >> meant > >>> to be tuned, it shouldn't be a configuration at all. Otherwise it > should > >> be > >>> exposed for reference. > >>> > >>> I'm wondering if there is some compromise we can achieve which makes it > >>> easier for end-users to integrate with tunable configs. > >>> > >>> One can imagine a large project to clean up all of our configs under > some > >>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making > perfect > >>> (needing to migrate all configs) the enemy of good. > >>> > >>> A better option might be to make those classes which expose configs > >>> LimitedPrivate(CONFIG) -- for example AsyncProcess and > >>> ConnectionImplementation. That might be the most incremental change we > >>> could make. We could handle this on a case-by-case basis. > >>> > >>> Does anyone have any thoughts? > >>> > >> >
