I realized there already exists a good candidate for this in hbase-client
-- ConnectionConfiguration. My latest commit in
https://github.com/apache/hbase/pull/4180 adds a new config constant there
and marks it as LP(Config), but I'd also be happy to revert that part of
the PR and instead handle that in a dedicated jira for this topic if
desired.

On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <[email protected]>
wrote:

> Thank you both for the input here!
>
> It seems like we've come to the conclusion that it would be ok to:
>
> - Update docs around LP(CONFIG) to say that it also encompasses some
> module-aggregated classes which hold config constants
> - Maybe document this convention elsewhere for contributors as well,
> though need to look into where (guidance welcome, but will look around)
> - I can create the first constant class in hbase-client for the use-cases
> in my jira. I also have another config constant I've added in
> https://github.com/apache/hbase/pull/4180 prior to this discussion, so
> may want to use that class there.
> - Look into updates to tooling to audit the LP(CONFIG) classes
>
> If this sounds good, I'll create a JIRA to track the work. The first 3
> todos will be relatively easy, and I'll look into options for the audit tool
>
>
> On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <[email protected]>
> wrote:
>
>> Agreed the definition of LP(CONFIG) would need to be tweaked.
>>
>> We do not quite have enough support for analyzing configuration key set
>> changes in the current API audit tool. Removal of a public constant field
>> from an LP annotated class would be flagged but a modification of the
>> constant would not. However it is a OSS project consisting of Perl scripts
>> that might accept a contribution or at least could be patched or extended.
>> Or we can build a new audit tool for the purpose, which is what I would
>> recommend. The Perl based tool shells out to javah and is quite expensive,
>> and on some platforms, depending on comparison, can fail due to command
>> line length limits. A Java tool would likely be more efficient at
>> processing Java class annotations and introspecting string constants.
>>
>> >
>> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <[email protected]> wrote:
>> >
>> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
>> [email protected]>
>> > wrote:
>> >
>> >> Although collecting all configuration keys into a single file is
>> >> definitely an anti-pattern I’m not sure the same is true of package or
>> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
>> >> DFSConfigKeys but geared toward our API/release auditing.
>> >>
>> >> This would seem virtuous for a couple of reasons. Relevant
>> configuration
>> >> key constants for the package or module would be grouped in a well
>> known
>> >> place for users and developers alike. The LP(CONFIG) designation would
>> >> require developers to think about deprecation cycle if contemplating a
>> >> change, thus providing some back pressure against snap decisions. Or,
>> if
>> >> not then, then at release candidate evaluation time, user configuration
>> >> breaking changes could be caught be a release automation tool that
>> diffs
>> >> LP(CONFIG) annotated classes. Something like this would improve the
>> state
>> >> of configuration key management quite dramatically, because currently
>> it’s
>> >> ad hoc.
>> >>
>> >
>> > I am supportive of trying such an effort. We'll need to tweak the
>> meaning
>> > of LP(CONFIG) as we define it currently, but that can be done. I don't
>> know
>> > what our current tools do or assume regarding this annotation. I think
>> > there is something custom happening in the compatibility reports that we
>> > generate as part of each RC.
>> >
>> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
>> <[email protected]>
>> >>> wrote:
>> >>>
>> >>> Thanks for your detailed response, Nick!
>> >>>
>> >>>> I think that none of my comments address your intended topic: how do
>> we
>> >>> publish our configuration points as an API that can be consumed by
>> user
>> >>> applications? (Do I have that correct?)
>> >>>
>> >>> This is a good summary, and I appreciate the other
>> >> thoughts/clarifications
>> >>> as well. I also realize this is probably hard to get perfect and any
>> >> choice
>> >>> must be weighed against the effort necessary to change/maintain.
>> >>>
>> >>> One example I know is Hadoop/HDFS, and I bet some on this list have
>> much
>> >>> more knowledge of that project's history than I do. For HDFS they have
>> >>> DFSConfigKeys which in my experience does seem to include most
>> configs. I
>> >>> believe they even have unit tests which verify that all configs in the
>> >>> various site.xml files are represented in code. In more recent
>> versions
>> >>> they have split that class up into smaller groupings, for example
>> >>> DfsClientConf and the various inner classes there.
>> >>>
>> >>> In a vacuum, from a code design perspective, I'm not commenting on
>> >> whether
>> >>> that's a good or bad pattern. I also don't know of the politics of the
>> >>> project or what sorts of pain points they've discovered in that
>> pattern
>> >>> over the years. But from *user's perspective*, this is a handy way to
>> >>> handle things in my opinion.
>> >>>
>> >>> At my company, in general we try to avoid "magic strings" [1] and
>> instead
>> >>> always try to use constants. We can and do define our own constants to
>> >> try
>> >>> to mirror some of the "private" magic strings in the hbase client.
>> This
>> >> is
>> >>> better than nothing but even better would be to use hbase-provided
>> >>> constants so that we can build more defensive applications, using the
>> >>> compiler to verify that the configs we reference still do anything.
>> >>>
>> >>> I unfortunately can't speak to the original issues with HConstants
>> that
>> >>> turned it into an anti-pattern. What I do notice is there are
>> definitely
>> >>> examples in the hbase codebase of duplicated config strings, one of
>> which
>> >>> is called out in one of the jiras I linked in my original email. These
>> >> are
>> >>> just bugs waiting to happen in my opinion, either for hbase itself or
>> for
>> >>> users which may reference them.
>> >>>
>> >>> [1] https://deviq.com/antipatterns/magic-strings
>> <https://deviq.com/antipatterns/magic-strings>
>> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]>
>> >> wrote:
>> >>>>
>> >>>> Hi Bryan,
>> >>>>
>> >>>> Thanks for bringing this up.
>> >>>>
>> >>>> I agree with Duo (and I think we have this settled as project-wide
>> >>>> consensus) that HConstants is/was an anti-pattern, that we are
>> actively
>> >>>> against adding new fields there, and opportunistically removing
>> fields
>> >> when
>> >>>> we can. Further, the documented meaning of the
>> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
>> appear
>> >> in
>> >>>> user facing configuration files", so this isn't really appropriate
>> for
>> >>>> marking a field that exposes a configuration key to user
>> applications. I
>> >>>> will also note that there appears to be two categories of tunable
>> >>>> parameters -- configuration points that we expect users to tweak are
>> >>>> catalogued and documented in the book [0] and everything else is
>> left to
>> >>>> the obscurity of code-grep.
>> >>>>
>> >>>> While we are actively squashing use of fields in HConstants, I don't
>> >> know
>> >>>> that we have proposed some alternative to the user community. For my
>> >> part,
>> >>>> when I write and review code that involves configuration keys, I
>> >> generally
>> >>>> implement the key constant string as a private field in an
>> appropriate
>> >>>> class, and the unit test coverage for that configuration key
>> replicates
>> >> the
>> >>>> string in the test. My reasoning being that the string is a part of
>> our
>> >>>> public API and making a change to the public API should be detected
>> from
>> >>>> the unit test. I also have (on occasion) gone out of my way to write
>> >> about
>> >>>> the configuration keys in the package or class-level javadoc.
>> >>>>
>> >>>> I think that none of my comments address your intended topic: how do
>> we
>> >>>> publish our configuration points as an API that can be consumed by
>> user
>> >>>> applications? (Do I have that correct?)
>> >>>>
>> >>>> I am of the mind that we don't need/want an API of configurations ;
>> we
>> >> want
>> >>>> a catalogue, i.e., what has been started in our book. Perhaps
>> >> accompanied
>> >>>> by/generated from an authoritative hbase-defaults.xml file. In fact,
>> we
>> >>>> already do generate from hbase-default.xml, the result is [1] ; I
>> don't
>> >>>> believe it is authoritative.
>> >>>>
>> >>>> If we did have an AP thoughI, what would be better than the
>> HConstants
>> >>>> approach of key-strings as public fields ? What if we had a
>> >>>> ConfigurationBuilder type of class, which had methods tied to
>> >> configuration
>> >>>> keys? I would think that such a globally applicable class would have
>> the
>> >>>> same maintenance issues as HConstants. But what if we had some kind
>> of
>> >>>> ConfigurationSetter class, perhaps per package, that performed this
>> >>>> function? That might be maintainable for us and useful for users.
>> >>>>
>> >>>> I'm keen to hear what other ideas are out there, or better, examples
>> and
>> >>>> counter-examples from other projects.
>> >>>>
>> >>>> Thanks,
>> >>>> Nick
>> >>>>
>> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> >>>> <https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> >
>> >>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>> >
>> >>>>
>> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>>> Hi devs,
>> >>>>>
>> >>>>> As a major user of hbase, my company has thousands of clients
>> deployed
>> >>>>> which use the hbase client to connect to a variety of hbase
>> clusters.
>> >> We
>> >>>>> have a common library which handles configuring all clients by
>> setting
>> >> up
>> >>>>> the Configuration object prior to creating a Connection. Our library
>> >> sets
>> >>>>> configurations using the various configs in HConstants, but there
>> are
>> >>>> also
>> >>>>> a bunch of configs which don't exist in HConstants. For these we
>> have
>> >>>>> hardcoded config strings in our client.
>> >>>>>
>> >>>>> We're now working on an hbase upgrade and need to go through our
>> client
>> >>>>> library and check how the configs may have changed in the new
>> version.
>> >>>> This
>> >>>>> is relatively easy to do for those HConstants cases -- configs may
>> be
>> >>>>> marked @Deprecated which will show up in one's editor, they may be
>> >>>> removed
>> >>>>> entirely which would show up is a compile error, and otherwise one
>> can
>> >>>>> easily click through or bring up the javadoc. For the others that
>> don't
>> >>>>> exist in HConstants, we need to go manually search the hbase
>> codebase
>> >> for
>> >>>>> those strings.
>> >>>>>
>> >>>>> Without doing this painstaking manual process, we would potentially
>> >>>> deploy
>> >>>>> the upgraded client with configs which are no longer used or
>> deprecated
>> >>>> by
>> >>>>> the hbase client. For those using HConstants, this is immediately
>> >> obvious
>> >>>>> because the HConstant field may have been removed. This is a clear
>> >>>>> indication of needing to investigate the config. In this case it's
>> >>>>> preferred to face the compile failure because it's clearer than
>> having
>> >>>>> something silently disappear or change.
>> >>>>>
>> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
>> >>>> reasonable
>> >>>>> pushback from Duo:
>> >>>>>
>> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>> >
>> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>> >
>> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>> >
>> >>>>>
>> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
>> configs
>> >>>> are
>> >>>>> not part of our public API. I can agree that a catch-all constants
>> >> class
>> >>>>> might be an anti-pattern, but would argue that consolidating configs
>> >>>> there
>> >>>>> is very useful for end-users. I can also potentially agree that
>> >> exposing
>> >>>>> these as part of our public API might limit the flexibility of
>> >>>> development
>> >>>>> due to compatibility constraints about IA.Public.
>> >>>>>
>> >>>>> To me it seems odd to add a configuration, whose whole point is to
>> make
>> >>>>> something tuneable, but then bury it in a private class despite
>> having
>> >>>> real
>> >>>>> implications for how the application runs. If a configuration is not
>> >>>> meant
>> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
>> >> should
>> >>>> be
>> >>>>> exposed for reference.
>> >>>>>
>> >>>>> I'm wondering if there is some compromise we can achieve which
>> makes it
>> >>>>> easier for end-users to integrate with tunable configs.
>> >>>>>
>> >>>>> One can imagine a large project to clean up all of our configs under
>> >> some
>> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
>> >> perfect
>> >>>>> (needing to migrate all configs) the enemy of good.
>> >>>>>
>> >>>>> A better option might be to make those classes which expose configs
>> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>> >>>>> ConnectionImplementation. That might be the most incremental change
>> we
>> >>>>> could make. We could handle this on a case-by-case basis.
>> >>>>>
>> >>>>> Does anyone have any thoughts?
>> >>>>>
>> >>>>
>> >>
>>
>

Reply via email to