I realized there already exists a good candidate for this in hbase-client -- ConnectionConfiguration. My latest commit in https://github.com/apache/hbase/pull/4180 adds a new config constant there and marks it as LP(Config), but I'd also be happy to revert that part of the PR and instead handle that in a dedicated jira for this topic if desired.
On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <[email protected]> wrote: > Thank you both for the input here! > > It seems like we've come to the conclusion that it would be ok to: > > - Update docs around LP(CONFIG) to say that it also encompasses some > module-aggregated classes which hold config constants > - Maybe document this convention elsewhere for contributors as well, > though need to look into where (guidance welcome, but will look around) > - I can create the first constant class in hbase-client for the use-cases > in my jira. I also have another config constant I've added in > https://github.com/apache/hbase/pull/4180 prior to this discussion, so > may want to use that class there. > - Look into updates to tooling to audit the LP(CONFIG) classes > > If this sounds good, I'll create a JIRA to track the work. The first 3 > todos will be relatively easy, and I'll look into options for the audit tool > > > On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <[email protected]> > wrote: > >> Agreed the definition of LP(CONFIG) would need to be tweaked. >> >> We do not quite have enough support for analyzing configuration key set >> changes in the current API audit tool. Removal of a public constant field >> from an LP annotated class would be flagged but a modification of the >> constant would not. However it is a OSS project consisting of Perl scripts >> that might accept a contribution or at least could be patched or extended. >> Or we can build a new audit tool for the purpose, which is what I would >> recommend. The Perl based tool shells out to javah and is quite expensive, >> and on some platforms, depending on comparison, can fail due to command >> line length limits. A Java tool would likely be more efficient at >> processing Java class annotations and introspecting string constants. >> >> > >> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <[email protected]> wrote: >> > >> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell < >> [email protected]> >> > wrote: >> > >> >> Although collecting all configuration keys into a single file is >> >> definitely an anti-pattern I’m not sure the same is true of package or >> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat like >> >> DFSConfigKeys but geared toward our API/release auditing. >> >> >> >> This would seem virtuous for a couple of reasons. Relevant >> configuration >> >> key constants for the package or module would be grouped in a well >> known >> >> place for users and developers alike. The LP(CONFIG) designation would >> >> require developers to think about deprecation cycle if contemplating a >> >> change, thus providing some back pressure against snap decisions. Or, >> if >> >> not then, then at release candidate evaluation time, user configuration >> >> breaking changes could be caught be a release automation tool that >> diffs >> >> LP(CONFIG) annotated classes. Something like this would improve the >> state >> >> of configuration key management quite dramatically, because currently >> it’s >> >> ad hoc. >> >> >> > >> > I am supportive of trying such an effort. We'll need to tweak the >> meaning >> > of LP(CONFIG) as we define it currently, but that can be done. I don't >> know >> > what our current tools do or assume regarding this annotation. I think >> > there is something custom happening in the compatibility reports that we >> > generate as part of each RC. >> > >> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault >> <[email protected]> >> >>> wrote: >> >>> >> >>> Thanks for your detailed response, Nick! >> >>> >> >>>> I think that none of my comments address your intended topic: how do >> we >> >>> publish our configuration points as an API that can be consumed by >> user >> >>> applications? (Do I have that correct?) >> >>> >> >>> This is a good summary, and I appreciate the other >> >> thoughts/clarifications >> >>> as well. I also realize this is probably hard to get perfect and any >> >> choice >> >>> must be weighed against the effort necessary to change/maintain. >> >>> >> >>> One example I know is Hadoop/HDFS, and I bet some on this list have >> much >> >>> more knowledge of that project's history than I do. For HDFS they have >> >>> DFSConfigKeys which in my experience does seem to include most >> configs. I >> >>> believe they even have unit tests which verify that all configs in the >> >>> various site.xml files are represented in code. In more recent >> versions >> >>> they have split that class up into smaller groupings, for example >> >>> DfsClientConf and the various inner classes there. >> >>> >> >>> In a vacuum, from a code design perspective, I'm not commenting on >> >> whether >> >>> that's a good or bad pattern. I also don't know of the politics of the >> >>> project or what sorts of pain points they've discovered in that >> pattern >> >>> over the years. But from *user's perspective*, this is a handy way to >> >>> handle things in my opinion. >> >>> >> >>> At my company, in general we try to avoid "magic strings" [1] and >> instead >> >>> always try to use constants. We can and do define our own constants to >> >> try >> >>> to mirror some of the "private" magic strings in the hbase client. >> This >> >> is >> >>> better than nothing but even better would be to use hbase-provided >> >>> constants so that we can build more defensive applications, using the >> >>> compiler to verify that the configs we reference still do anything. >> >>> >> >>> I unfortunately can't speak to the original issues with HConstants >> that >> >>> turned it into an anti-pattern. What I do notice is there are >> definitely >> >>> examples in the hbase codebase of duplicated config strings, one of >> which >> >>> is called out in one of the jiras I linked in my original email. These >> >> are >> >>> just bugs waiting to happen in my opinion, either for hbase itself or >> for >> >>> users which may reference them. >> >>> >> >>> [1] https://deviq.com/antipatterns/magic-strings >> <https://deviq.com/antipatterns/magic-strings> >> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]> >> >> wrote: >> >>>> >> >>>> Hi Bryan, >> >>>> >> >>>> Thanks for bringing this up. >> >>>> >> >>>> I agree with Duo (and I think we have this settled as project-wide >> >>>> consensus) that HConstants is/was an anti-pattern, that we are >> actively >> >>>> against adding new fields there, and opportunistically removing >> fields >> >> when >> >>>> we can. Further, the documented meaning of the >> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that >> appear >> >> in >> >>>> user facing configuration files", so this isn't really appropriate >> for >> >>>> marking a field that exposes a configuration key to user >> applications. I >> >>>> will also note that there appears to be two categories of tunable >> >>>> parameters -- configuration points that we expect users to tweak are >> >>>> catalogued and documented in the book [0] and everything else is >> left to >> >>>> the obscurity of code-grep. >> >>>> >> >>>> While we are actively squashing use of fields in HConstants, I don't >> >> know >> >>>> that we have proposed some alternative to the user community. For my >> >> part, >> >>>> when I write and review code that involves configuration keys, I >> >> generally >> >>>> implement the key constant string as a private field in an >> appropriate >> >>>> class, and the unit test coverage for that configuration key >> replicates >> >> the >> >>>> string in the test. My reasoning being that the string is a part of >> our >> >>>> public API and making a change to the public API should be detected >> from >> >>>> the unit test. I also have (on occasion) gone out of my way to write >> >> about >> >>>> the configuration keys in the package or class-level javadoc. >> >>>> >> >>>> I think that none of my comments address your intended topic: how do >> we >> >>>> publish our configuration points as an API that can be consumed by >> user >> >>>> applications? (Do I have that correct?) >> >>>> >> >>>> I am of the mind that we don't need/want an API of configurations ; >> we >> >> want >> >>>> a catalogue, i.e., what has been started in our book. Perhaps >> >> accompanied >> >>>> by/generated from an authoritative hbase-defaults.xml file. In fact, >> we >> >>>> already do generate from hbase-default.xml, the result is [1] ; I >> don't >> >>>> believe it is authoritative. >> >>>> >> >>>> If we did have an AP thoughI, what would be better than the >> HConstants >> >>>> approach of key-strings as public fields ? What if we had a >> >>>> ConfigurationBuilder type of class, which had methods tied to >> >> configuration >> >>>> keys? I would think that such a globally applicable class would have >> the >> >>>> same maintenance issues as HConstants. But what if we had some kind >> of >> >>>> ConfigurationSetter class, perhaps per package, that performed this >> >>>> function? That might be maintainable for us and useful for users. >> >>>> >> >>>> I'm keen to hear what other ideas are out there, or better, examples >> and >> >>>> counter-examples from other projects. >> >>>> >> >>>> Thanks, >> >>>> Nick >> >>>> >> >>>> [0]: https://hbase.apache.org/book.html#important_configurations >> <https://hbase.apache.org/book.html#important_configurations> >> >>>> <https://hbase.apache.org/book.html#important_configurations >> <https://hbase.apache.org/book.html#important_configurations> >> > >> >>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations >> <https://hbase.apache.org/book.html#hbase_default_configurations> >> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations >> <https://hbase.apache.org/book.html#hbase_default_configurations> >> > >> >>>> >> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault >> >>>> <[email protected]> wrote: >> >>>> >> >>>>> Hi devs, >> >>>>> >> >>>>> As a major user of hbase, my company has thousands of clients >> deployed >> >>>>> which use the hbase client to connect to a variety of hbase >> clusters. >> >> We >> >>>>> have a common library which handles configuring all clients by >> setting >> >> up >> >>>>> the Configuration object prior to creating a Connection. Our library >> >> sets >> >>>>> configurations using the various configs in HConstants, but there >> are >> >>>> also >> >>>>> a bunch of configs which don't exist in HConstants. For these we >> have >> >>>>> hardcoded config strings in our client. >> >>>>> >> >>>>> We're now working on an hbase upgrade and need to go through our >> client >> >>>>> library and check how the configs may have changed in the new >> version. >> >>>> This >> >>>>> is relatively easy to do for those HConstants cases -- configs may >> be >> >>>>> marked @Deprecated which will show up in one's editor, they may be >> >>>> removed >> >>>>> entirely which would show up is a compile error, and otherwise one >> can >> >>>>> easily click through or bring up the javadoc. For the others that >> don't >> >>>>> exist in HConstants, we need to go manually search the hbase >> codebase >> >> for >> >>>>> those strings. >> >>>>> >> >>>>> Without doing this painstaking manual process, we would potentially >> >>>> deploy >> >>>>> the upgraded client with configs which are no longer used or >> deprecated >> >>>> by >> >>>>> the hbase client. For those using HConstants, this is immediately >> >> obvious >> >>>>> because the HConstant field may have been removed. This is a clear >> >>>>> indication of needing to investigate the config. In this case it's >> >>>>> preferred to face the compile failure because it's clearer than >> having >> >>>>> something silently disappear or change. >> >>>>> >> >>>>> I opened 3 jiras to move some fields to HConstants, but got some >> >>>> reasonable >> >>>>> pushback from Duo: >> >>>>> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26845 >> <https://issues.apache.org/jira/browse/HBASE-26845> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26845 >> <https://issues.apache.org/jira/browse/HBASE-26845> >> > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26846 >> <https://issues.apache.org/jira/browse/HBASE-26846> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26846 >> <https://issues.apache.org/jira/browse/HBASE-26846> >> > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26847 >> <https://issues.apache.org/jira/browse/HBASE-26847> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26847 >> <https://issues.apache.org/jira/browse/HBASE-26847> >> > >> >>>>> >> >>>>> Duo's pushback is that HConstants is an anti-pattern and these >> configs >> >>>> are >> >>>>> not part of our public API. I can agree that a catch-all constants >> >> class >> >>>>> might be an anti-pattern, but would argue that consolidating configs >> >>>> there >> >>>>> is very useful for end-users. I can also potentially agree that >> >> exposing >> >>>>> these as part of our public API might limit the flexibility of >> >>>> development >> >>>>> due to compatibility constraints about IA.Public. >> >>>>> >> >>>>> To me it seems odd to add a configuration, whose whole point is to >> make >> >>>>> something tuneable, but then bury it in a private class despite >> having >> >>>> real >> >>>>> implications for how the application runs. If a configuration is not >> >>>> meant >> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it >> >> should >> >>>> be >> >>>>> exposed for reference. >> >>>>> >> >>>>> I'm wondering if there is some compromise we can achieve which >> makes it >> >>>>> easier for end-users to integrate with tunable configs. >> >>>>> >> >>>>> One can imagine a large project to clean up all of our configs under >> >> some >> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making >> >> perfect >> >>>>> (needing to migrate all configs) the enemy of good. >> >>>>> >> >>>>> A better option might be to make those classes which expose configs >> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and >> >>>>> ConnectionImplementation. That might be the most incremental change >> we >> >>>>> could make. We could handle this on a case-by-case basis. >> >>>>> >> >>>>> Does anyone have any thoughts? >> >>>>> >> >>>> >> >> >> >
