Thank you both for the input here!

It seems like we've come to the conclusion that it would be ok to:

- Update docs around LP(CONFIG) to say that it also encompasses some
module-aggregated classes which hold config constants
- Maybe document this convention elsewhere for contributors as well, though
need to look into where (guidance welcome, but will look around)
- I can create the first constant class in hbase-client for the use-cases
in my jira. I also have another config constant I've added in
https://github.com/apache/hbase/pull/4180 prior to this discussion, so may
want to use that class there.
- Look into updates to tooling to audit the LP(CONFIG) classes

If this sounds good, I'll create a JIRA to track the work. The first 3
todos will be relatively easy, and I'll look into options for the audit tool


On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <[email protected]>
wrote:

> Agreed the definition of LP(CONFIG) would need to be tweaked.
>
> We do not quite have enough support for analyzing configuration key set
> changes in the current API audit tool. Removal of a public constant field
> from an LP annotated class would be flagged but a modification of the
> constant would not. However it is a OSS project consisting of Perl scripts
> that might accept a contribution or at least could be patched or extended.
> Or we can build a new audit tool for the purpose, which is what I would
> recommend. The Perl based tool shells out to javah and is quite expensive,
> and on some platforms, depending on comparison, can fail due to command
> line length limits. A Java tool would likely be more efficient at
> processing Java class annotations and introspecting string constants.
>
> >
> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <[email protected]> wrote:
> >
> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
> [email protected]>
> > wrote:
> >
> >> Although collecting all configuration keys into a single file is
> >> definitely an anti-pattern I’m not sure the same is true of package or
> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
> >> DFSConfigKeys but geared toward our API/release auditing.
> >>
> >> This would seem virtuous for a couple of reasons. Relevant configuration
> >> key constants for the package or module would be grouped in a well known
> >> place for users and developers alike. The LP(CONFIG) designation would
> >> require developers to think about deprecation cycle if contemplating a
> >> change, thus providing some back pressure against snap decisions. Or, if
> >> not then, then at release candidate evaluation time, user configuration
> >> breaking changes could be caught be a release automation tool that diffs
> >> LP(CONFIG) annotated classes. Something like this would improve the
> state
> >> of configuration key management quite dramatically, because currently
> it’s
> >> ad hoc.
> >>
> >
> > I am supportive of trying such an effort. We'll need to tweak the meaning
> > of LP(CONFIG) as we define it currently, but that can be done. I don't
> know
> > what our current tools do or assume regarding this annotation. I think
> > there is something custom happening in the compatibility reports that we
> > generate as part of each RC.
> >
> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
> <[email protected]>
> >>> wrote:
> >>>
> >>> Thanks for your detailed response, Nick!
> >>>
> >>>> I think that none of my comments address your intended topic: how do
> we
> >>> publish our configuration points as an API that can be consumed by user
> >>> applications? (Do I have that correct?)
> >>>
> >>> This is a good summary, and I appreciate the other
> >> thoughts/clarifications
> >>> as well. I also realize this is probably hard to get perfect and any
> >> choice
> >>> must be weighed against the effort necessary to change/maintain.
> >>>
> >>> One example I know is Hadoop/HDFS, and I bet some on this list have
> much
> >>> more knowledge of that project's history than I do. For HDFS they have
> >>> DFSConfigKeys which in my experience does seem to include most
> configs. I
> >>> believe they even have unit tests which verify that all configs in the
> >>> various site.xml files are represented in code. In more recent versions
> >>> they have split that class up into smaller groupings, for example
> >>> DfsClientConf and the various inner classes there.
> >>>
> >>> In a vacuum, from a code design perspective, I'm not commenting on
> >> whether
> >>> that's a good or bad pattern. I also don't know of the politics of the
> >>> project or what sorts of pain points they've discovered in that pattern
> >>> over the years. But from *user's perspective*, this is a handy way to
> >>> handle things in my opinion.
> >>>
> >>> At my company, in general we try to avoid "magic strings" [1] and
> instead
> >>> always try to use constants. We can and do define our own constants to
> >> try
> >>> to mirror some of the "private" magic strings in the hbase client. This
> >> is
> >>> better than nothing but even better would be to use hbase-provided
> >>> constants so that we can build more defensive applications, using the
> >>> compiler to verify that the configs we reference still do anything.
> >>>
> >>> I unfortunately can't speak to the original issues with HConstants that
> >>> turned it into an anti-pattern. What I do notice is there are
> definitely
> >>> examples in the hbase codebase of duplicated config strings, one of
> which
> >>> is called out in one of the jiras I linked in my original email. These
> >> are
> >>> just bugs waiting to happen in my opinion, either for hbase itself or
> for
> >>> users which may reference them.
> >>>
> >>> [1] https://deviq.com/antipatterns/magic-strings
> <https://deviq.com/antipatterns/magic-strings>
> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]>
> >> wrote:
> >>>>
> >>>> Hi Bryan,
> >>>>
> >>>> Thanks for bringing this up.
> >>>>
> >>>> I agree with Duo (and I think we have this settled as project-wide
> >>>> consensus) that HConstants is/was an anti-pattern, that we are
> actively
> >>>> against adding new fields there, and opportunistically removing fields
> >> when
> >>>> we can. Further, the documented meaning of the
> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
> appear
> >> in
> >>>> user facing configuration files", so this isn't really appropriate for
> >>>> marking a field that exposes a configuration key to user
> applications. I
> >>>> will also note that there appears to be two categories of tunable
> >>>> parameters -- configuration points that we expect users to tweak are
> >>>> catalogued and documented in the book [0] and everything else is left
> to
> >>>> the obscurity of code-grep.
> >>>>
> >>>> While we are actively squashing use of fields in HConstants, I don't
> >> know
> >>>> that we have proposed some alternative to the user community. For my
> >> part,
> >>>> when I write and review code that involves configuration keys, I
> >> generally
> >>>> implement the key constant string as a private field in an appropriate
> >>>> class, and the unit test coverage for that configuration key
> replicates
> >> the
> >>>> string in the test. My reasoning being that the string is a part of
> our
> >>>> public API and making a change to the public API should be detected
> from
> >>>> the unit test. I also have (on occasion) gone out of my way to write
> >> about
> >>>> the configuration keys in the package or class-level javadoc.
> >>>>
> >>>> I think that none of my comments address your intended topic: how do
> we
> >>>> publish our configuration points as an API that can be consumed by
> user
> >>>> applications? (Do I have that correct?)
> >>>>
> >>>> I am of the mind that we don't need/want an API of configurations ; we
> >> want
> >>>> a catalogue, i.e., what has been started in our book. Perhaps
> >> accompanied
> >>>> by/generated from an authoritative hbase-defaults.xml file. In fact,
> we
> >>>> already do generate from hbase-default.xml, the result is [1] ; I
> don't
> >>>> believe it is authoritative.
> >>>>
> >>>> If we did have an AP thoughI, what would be better than the HConstants
> >>>> approach of key-strings as public fields ? What if we had a
> >>>> ConfigurationBuilder type of class, which had methods tied to
> >> configuration
> >>>> keys? I would think that such a globally applicable class would have
> the
> >>>> same maintenance issues as HConstants. But what if we had some kind of
> >>>> ConfigurationSetter class, perhaps per package, that performed this
> >>>> function? That might be maintainable for us and useful for users.
> >>>>
> >>>> I'm keen to hear what other ideas are out there, or better, examples
> and
> >>>> counter-examples from other projects.
> >>>>
> >>>> Thanks,
> >>>> Nick
> >>>>
> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> >>>> <https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> >
> >>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >
> >>>>
> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >>>> <[email protected]> wrote:
> >>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> As a major user of hbase, my company has thousands of clients
> deployed
> >>>>> which use the hbase client to connect to a variety of hbase clusters.
> >> We
> >>>>> have a common library which handles configuring all clients by
> setting
> >> up
> >>>>> the Configuration object prior to creating a Connection. Our library
> >> sets
> >>>>> configurations using the various configs in HConstants, but there are
> >>>> also
> >>>>> a bunch of configs which don't exist in HConstants. For these we have
> >>>>> hardcoded config strings in our client.
> >>>>>
> >>>>> We're now working on an hbase upgrade and need to go through our
> client
> >>>>> library and check how the configs may have changed in the new
> version.
> >>>> This
> >>>>> is relatively easy to do for those HConstants cases -- configs may be
> >>>>> marked @Deprecated which will show up in one's editor, they may be
> >>>> removed
> >>>>> entirely which would show up is a compile error, and otherwise one
> can
> >>>>> easily click through or bring up the javadoc. For the others that
> don't
> >>>>> exist in HConstants, we need to go manually search the hbase codebase
> >> for
> >>>>> those strings.
> >>>>>
> >>>>> Without doing this painstaking manual process, we would potentially
> >>>> deploy
> >>>>> the upgraded client with configs which are no longer used or
> deprecated
> >>>> by
> >>>>> the hbase client. For those using HConstants, this is immediately
> >> obvious
> >>>>> because the HConstant field may have been removed. This is a clear
> >>>>> indication of needing to investigate the config. In this case it's
> >>>>> preferred to face the compile failure because it's clearer than
> having
> >>>>> something silently disappear or change.
> >>>>>
> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
> >>>> reasonable
> >>>>> pushback from Duo:
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> >
> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> >
> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >
> >>>>>
> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
> configs
> >>>> are
> >>>>> not part of our public API. I can agree that a catch-all constants
> >> class
> >>>>> might be an anti-pattern, but would argue that consolidating configs
> >>>> there
> >>>>> is very useful for end-users. I can also potentially agree that
> >> exposing
> >>>>> these as part of our public API might limit the flexibility of
> >>>> development
> >>>>> due to compatibility constraints about IA.Public.
> >>>>>
> >>>>> To me it seems odd to add a configuration, whose whole point is to
> make
> >>>>> something tuneable, but then bury it in a private class despite
> having
> >>>> real
> >>>>> implications for how the application runs. If a configuration is not
> >>>> meant
> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
> >> should
> >>>> be
> >>>>> exposed for reference.
> >>>>>
> >>>>> I'm wondering if there is some compromise we can achieve which makes
> it
> >>>>> easier for end-users to integrate with tunable configs.
> >>>>>
> >>>>> One can imagine a large project to clean up all of our configs under
> >> some
> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> >> perfect
> >>>>> (needing to migrate all configs) the enemy of good.
> >>>>>
> >>>>> A better option might be to make those classes which expose configs
> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >>>>> ConnectionImplementation. That might be the most incremental change
> we
> >>>>> could make. We could handle this on a case-by-case basis.
> >>>>>
> >>>>> Does anyone have any thoughts?
> >>>>>
> >>>>
> >>
>

Reply via email to