Thank you both for the input here! It seems like we've come to the conclusion that it would be ok to:
- Update docs around LP(CONFIG) to say that it also encompasses some module-aggregated classes which hold config constants - Maybe document this convention elsewhere for contributors as well, though need to look into where (guidance welcome, but will look around) - I can create the first constant class in hbase-client for the use-cases in my jira. I also have another config constant I've added in https://github.com/apache/hbase/pull/4180 prior to this discussion, so may want to use that class there. - Look into updates to tooling to audit the LP(CONFIG) classes If this sounds good, I'll create a JIRA to track the work. The first 3 todos will be relatively easy, and I'll look into options for the audit tool On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <[email protected]> wrote: > Agreed the definition of LP(CONFIG) would need to be tweaked. > > We do not quite have enough support for analyzing configuration key set > changes in the current API audit tool. Removal of a public constant field > from an LP annotated class would be flagged but a modification of the > constant would not. However it is a OSS project consisting of Perl scripts > that might accept a contribution or at least could be patched or extended. > Or we can build a new audit tool for the purpose, which is what I would > recommend. The Perl based tool shells out to javah and is quite expensive, > and on some platforms, depending on comparison, can fail due to command > line length limits. A Java tool would likely be more efficient at > processing Java class annotations and introspecting string constants. > > > > > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <[email protected]> wrote: > > > > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell < > [email protected]> > > wrote: > > > >> Although collecting all configuration keys into a single file is > >> definitely an anti-pattern I’m not sure the same is true of package or > >> Maven module level aggregation classes marked LP(CONFIG).Somewhat like > >> DFSConfigKeys but geared toward our API/release auditing. > >> > >> This would seem virtuous for a couple of reasons. Relevant configuration > >> key constants for the package or module would be grouped in a well known > >> place for users and developers alike. The LP(CONFIG) designation would > >> require developers to think about deprecation cycle if contemplating a > >> change, thus providing some back pressure against snap decisions. Or, if > >> not then, then at release candidate evaluation time, user configuration > >> breaking changes could be caught be a release automation tool that diffs > >> LP(CONFIG) annotated classes. Something like this would improve the > state > >> of configuration key management quite dramatically, because currently > it’s > >> ad hoc. > >> > > > > I am supportive of trying such an effort. We'll need to tweak the meaning > > of LP(CONFIG) as we define it currently, but that can be done. I don't > know > > what our current tools do or assume regarding this annotation. I think > > there is something custom happening in the compatibility reports that we > > generate as part of each RC. > > > >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault > <[email protected]> > >>> wrote: > >>> > >>> Thanks for your detailed response, Nick! > >>> > >>>> I think that none of my comments address your intended topic: how do > we > >>> publish our configuration points as an API that can be consumed by user > >>> applications? (Do I have that correct?) > >>> > >>> This is a good summary, and I appreciate the other > >> thoughts/clarifications > >>> as well. I also realize this is probably hard to get perfect and any > >> choice > >>> must be weighed against the effort necessary to change/maintain. > >>> > >>> One example I know is Hadoop/HDFS, and I bet some on this list have > much > >>> more knowledge of that project's history than I do. For HDFS they have > >>> DFSConfigKeys which in my experience does seem to include most > configs. I > >>> believe they even have unit tests which verify that all configs in the > >>> various site.xml files are represented in code. In more recent versions > >>> they have split that class up into smaller groupings, for example > >>> DfsClientConf and the various inner classes there. > >>> > >>> In a vacuum, from a code design perspective, I'm not commenting on > >> whether > >>> that's a good or bad pattern. I also don't know of the politics of the > >>> project or what sorts of pain points they've discovered in that pattern > >>> over the years. But from *user's perspective*, this is a handy way to > >>> handle things in my opinion. > >>> > >>> At my company, in general we try to avoid "magic strings" [1] and > instead > >>> always try to use constants. We can and do define our own constants to > >> try > >>> to mirror some of the "private" magic strings in the hbase client. This > >> is > >>> better than nothing but even better would be to use hbase-provided > >>> constants so that we can build more defensive applications, using the > >>> compiler to verify that the configs we reference still do anything. > >>> > >>> I unfortunately can't speak to the original issues with HConstants that > >>> turned it into an anti-pattern. What I do notice is there are > definitely > >>> examples in the hbase codebase of duplicated config strings, one of > which > >>> is called out in one of the jiras I linked in my original email. These > >> are > >>> just bugs waiting to happen in my opinion, either for hbase itself or > for > >>> users which may reference them. > >>> > >>> [1] https://deviq.com/antipatterns/magic-strings > <https://deviq.com/antipatterns/magic-strings> > >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]> > >> wrote: > >>>> > >>>> Hi Bryan, > >>>> > >>>> Thanks for bringing this up. > >>>> > >>>> I agree with Duo (and I think we have this settled as project-wide > >>>> consensus) that HConstants is/was an anti-pattern, that we are > actively > >>>> against adding new fields there, and opportunistically removing fields > >> when > >>>> we can. Further, the documented meaning of the > >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that > appear > >> in > >>>> user facing configuration files", so this isn't really appropriate for > >>>> marking a field that exposes a configuration key to user > applications. I > >>>> will also note that there appears to be two categories of tunable > >>>> parameters -- configuration points that we expect users to tweak are > >>>> catalogued and documented in the book [0] and everything else is left > to > >>>> the obscurity of code-grep. > >>>> > >>>> While we are actively squashing use of fields in HConstants, I don't > >> know > >>>> that we have proposed some alternative to the user community. For my > >> part, > >>>> when I write and review code that involves configuration keys, I > >> generally > >>>> implement the key constant string as a private field in an appropriate > >>>> class, and the unit test coverage for that configuration key > replicates > >> the > >>>> string in the test. My reasoning being that the string is a part of > our > >>>> public API and making a change to the public API should be detected > from > >>>> the unit test. I also have (on occasion) gone out of my way to write > >> about > >>>> the configuration keys in the package or class-level javadoc. > >>>> > >>>> I think that none of my comments address your intended topic: how do > we > >>>> publish our configuration points as an API that can be consumed by > user > >>>> applications? (Do I have that correct?) > >>>> > >>>> I am of the mind that we don't need/want an API of configurations ; we > >> want > >>>> a catalogue, i.e., what has been started in our book. Perhaps > >> accompanied > >>>> by/generated from an authoritative hbase-defaults.xml file. In fact, > we > >>>> already do generate from hbase-default.xml, the result is [1] ; I > don't > >>>> believe it is authoritative. > >>>> > >>>> If we did have an AP thoughI, what would be better than the HConstants > >>>> approach of key-strings as public fields ? What if we had a > >>>> ConfigurationBuilder type of class, which had methods tied to > >> configuration > >>>> keys? I would think that such a globally applicable class would have > the > >>>> same maintenance issues as HConstants. But what if we had some kind of > >>>> ConfigurationSetter class, perhaps per package, that performed this > >>>> function? That might be maintainable for us and useful for users. > >>>> > >>>> I'm keen to hear what other ideas are out there, or better, examples > and > >>>> counter-examples from other projects. > >>>> > >>>> Thanks, > >>>> Nick > >>>> > >>>> [0]: https://hbase.apache.org/book.html#important_configurations > <https://hbase.apache.org/book.html#important_configurations> > >>>> <https://hbase.apache.org/book.html#important_configurations > <https://hbase.apache.org/book.html#important_configurations> > > > >>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations > <https://hbase.apache.org/book.html#hbase_default_configurations> > >>>> <https://hbase.apache.org/book.html#hbase_default_configurations > <https://hbase.apache.org/book.html#hbase_default_configurations> > > > >>>> > >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault > >>>> <[email protected]> wrote: > >>>> > >>>>> Hi devs, > >>>>> > >>>>> As a major user of hbase, my company has thousands of clients > deployed > >>>>> which use the hbase client to connect to a variety of hbase clusters. > >> We > >>>>> have a common library which handles configuring all clients by > setting > >> up > >>>>> the Configuration object prior to creating a Connection. Our library > >> sets > >>>>> configurations using the various configs in HConstants, but there are > >>>> also > >>>>> a bunch of configs which don't exist in HConstants. For these we have > >>>>> hardcoded config strings in our client. > >>>>> > >>>>> We're now working on an hbase upgrade and need to go through our > client > >>>>> library and check how the configs may have changed in the new > version. > >>>> This > >>>>> is relatively easy to do for those HConstants cases -- configs may be > >>>>> marked @Deprecated which will show up in one's editor, they may be > >>>> removed > >>>>> entirely which would show up is a compile error, and otherwise one > can > >>>>> easily click through or bring up the javadoc. For the others that > don't > >>>>> exist in HConstants, we need to go manually search the hbase codebase > >> for > >>>>> those strings. > >>>>> > >>>>> Without doing this painstaking manual process, we would potentially > >>>> deploy > >>>>> the upgraded client with configs which are no longer used or > deprecated > >>>> by > >>>>> the hbase client. For those using HConstants, this is immediately > >> obvious > >>>>> because the HConstant field may have been removed. This is a clear > >>>>> indication of needing to investigate the config. In this case it's > >>>>> preferred to face the compile failure because it's clearer than > having > >>>>> something silently disappear or change. > >>>>> > >>>>> I opened 3 jiras to move some fields to HConstants, but got some > >>>> reasonable > >>>>> pushback from Duo: > >>>>> > >>>>> https://issues.apache.org/jira/browse/HBASE-26845 > <https://issues.apache.org/jira/browse/HBASE-26845> > >>>> <https://issues.apache.org/jira/browse/HBASE-26845 > <https://issues.apache.org/jira/browse/HBASE-26845> > > > >>>>> https://issues.apache.org/jira/browse/HBASE-26846 > <https://issues.apache.org/jira/browse/HBASE-26846> > >>>> <https://issues.apache.org/jira/browse/HBASE-26846 > <https://issues.apache.org/jira/browse/HBASE-26846> > > > >>>>> https://issues.apache.org/jira/browse/HBASE-26847 > <https://issues.apache.org/jira/browse/HBASE-26847> > >>>> <https://issues.apache.org/jira/browse/HBASE-26847 > <https://issues.apache.org/jira/browse/HBASE-26847> > > > >>>>> > >>>>> Duo's pushback is that HConstants is an anti-pattern and these > configs > >>>> are > >>>>> not part of our public API. I can agree that a catch-all constants > >> class > >>>>> might be an anti-pattern, but would argue that consolidating configs > >>>> there > >>>>> is very useful for end-users. I can also potentially agree that > >> exposing > >>>>> these as part of our public API might limit the flexibility of > >>>> development > >>>>> due to compatibility constraints about IA.Public. > >>>>> > >>>>> To me it seems odd to add a configuration, whose whole point is to > make > >>>>> something tuneable, but then bury it in a private class despite > having > >>>> real > >>>>> implications for how the application runs. If a configuration is not > >>>> meant > >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it > >> should > >>>> be > >>>>> exposed for reference. > >>>>> > >>>>> I'm wondering if there is some compromise we can achieve which makes > it > >>>>> easier for end-users to integrate with tunable configs. > >>>>> > >>>>> One can imagine a large project to clean up all of our configs under > >> some > >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making > >> perfect > >>>>> (needing to migrate all configs) the enemy of good. > >>>>> > >>>>> A better option might be to make those classes which expose configs > >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and > >>>>> ConnectionImplementation. That might be the most incremental change > we > >>>>> could make. We could handle this on a case-by-case basis. > >>>>> > >>>>> Does anyone have any thoughts? > >>>>> > >>>> > >> >
