Hi devs,

We've been at this a while now, and I want to share an update with you.
Here's the current HDP 3.1 upgrade Jira again for reference -
https://issues.apache.org/jira/browse/METRON-2088. Nick, Ryan, and I had a
number of offline conversations in the past week discussing some of what
has been learned during the upgrade as well as how to best address some of
the feedback originating in the recent HBase PR reviews (
https://github.com/apache/metron/pull/1470#issuecomment-521033037).

*Prolog*

As this discuss thread shows, there's little debate from the community
regarding the need to upgrade to HDP 3.1 and our current version of Storm
is being eol'ed (
https://lists.apache.org/thread.html/7dbac4e50159ec899d1965505a9844f503130b1526a0e577c705959c@%3Cdev.metron.apache.org%3E).
Metron has been running on the same version of Hadoop ever since its
graduation to a TLP a few years ago. While there have been numerous
individual component/service upgrades: ElasticSearch, Solr, and Storm, to
name a few, this is our first major platform-wide upgrade.

The biggest challenge we've run into so far has been dealing with the HBase
client APIs that were deprecated in HBase 1.x and now removed in 2.x. We
were previously able to depend on HTableInterface and HTables fully
encapsulating the connection management logic, but this was decoupled in
the new API. It is now left to the user to manage connection lifecycle.
Nick Allen spent time exploring various options for how to move forward
with new abstractions that would accommodate a new connection strategy as
well as work within our existing Storm architecture. Fully extracting logic
for managing HBase connections independent of the Tables proved to have
dramatic ripple effects throughout the architecture (again, refer to
https://github.com/apache/metron/pull/1470#issuecomment-521033037 for more
details). We also encountered major changes in the core classes used by our
MockHTable implementation used in the integration tests. These two problems
resulted in interface changes that affected quite a bit of code.

Reflecting on what we learned from this refactoring push, we explored other
options to reduce the overall surface area impacted by the API change. The
main thrust of our work seems to hinge on how to deal with the new
connection management problem. We looked at options for how to leverage the
existing TableProvider abstraction and decided to try out a compromise
approach that allows us to:

   1. Upgrade as much of the API as possible in the current version of
   HBase against master
   2. Manage connections within the TableProvider abstraction - this would
   have an API feel that is similar to what had been encapsulated by the
   HTableInterfaces we rely on currently, and remove a large chunk of the code
   that had been necessary to finish the upgrade.


*Reducing Scope*

We have long-lived connections to HBase that don't need to be opened/closed
and pooled in the traditional request/response lifecycle sense. We know at
the time our application spins up how many processes and threads there will
be - this is static for us. I put together a PR (
https://issues.apache.org/jira/browse/METRON-2217) that migrates HTable and
HTableInterface classes to the new Table API and encapsulates connection
management within the TableProviders. We had some concerns about risks
associated with managing the connections this way, as opposed to using a
more robust connection management approach, so I reached out to the HBase
community to get some guidance. The feedback we received suggests that
managing our connections this way should be sufficient. And the HBase
connection objects are threadsafe, to boot.
https://lists.apache.org/thread.html/6b83cd7548efb8c37899063affc97e4c5ce823a13359a49b477e3c07@%3Cdev.hbase.apache.org%3E

*A Revised Plan*

The alternative HBase client/connection approach is promising, and it
greatly reduces the overall architectural impact we will need to absorb
alongside a major upgrade. The following is my proposal after some coding
experiments and numerous conversations with Nick Allen, Ryan Merriman,
Casey Stella, Otto Fowler, and James Sirota.


   1. Do as much refactoring in small chunks as possible in master. e.g.
   the first phase of the HBase API changes. Reducing the overall number of
   variables changing all at the same time in the same place should reduce the
   overall risk of the upgrade. Prove stability with what we can in master and
   the issues we run into in the FB should be easier to isolate and solve.
   2. Target the upgrade feature branch as being a place where we primarily
   have to deal with changes due to classpath problems. There will be some
   other necessary code changes, e.g. hbase coprocessor, however the changes
   should be well-isolated and narrower in scope.
   3. Ryan and I have had numerous conversations surrounding the Maven
   dependency classpath issues that frequently come up at runtime anytime even
   the smallest change to our dependency tree occurs. I won't go into those
   details now, but you can see the discussion and history here (
   https://github.com/apache/metron/pull/1436). While there's inherent risk
   in making big changes to our dep management, there is also a substantial
   upside - this PR makes finding classpath problems and solving them
   substantially easier. This PR is ready to go in master and should greatly
   speed up our ability to rectify and cp problems we encounter in the feature
   branch.
   4. Find an analog for our port of the MockHTable (
   
https://github.com/apache/metron/blob/master/metron-platform/metron-hbase/metron-hbase-common/src/test/java/org/apache/metron/hbase/mock/MockHTable.java)
in
   HBase 2.0.2. Nick has been working on a POC around this alongside my work
   on the other API migration and he has been able to get to a point with the
   integration tests passing. We had originally hoped this could be landed in
   master, but the underlying low-level supporting classes have changed and
   are not be forwards compatible the way the Table interface
   and ConnectionFactory class are. We plan to land this in the feature branch.
   5. Manage component version changes in an HDP 3.1 profile that gets
   updated as PRs are submitted. This allows the modules to be upgraded on a
   per-component basis, while still compiling and allowing tests to run,
   without requiring a big bang all-or-nothing upgrade. We would then do a
   final reconciliation and deprecation of the old Hadoop versions and profile
   at the tail of the FB. https://issues.apache.org/jira/browse/METRON-2223
   6. Upgrade Kafka, Storm, Solr, and Zeppelin. PRs from Ryan are up in the
   feature branch now.
   7. Revert the HBase feature branch PRs that have already gone in. The
   new approach removes the need for the HBase client changes that have
   already gone in, so we should remove them before polishing off the HBase
   upgrade.
   8. Merge in master - including the Maven and HTable migration PRs
   9. Finish HBase upgrade: coprocessor, integration test changes, data
   management
   10. Upgrade Hadoop
   11. Final dependency reconciliation
   https://issues.apache.org/jira/browse/METRON-2223
   12. Acceptance testing
   13. Beers, Profit

I think I've covered the major tasks, but if I've missed anything please
reach out.

Best,
Mike Miklavcic



On Tue, Apr 23, 2019 at 8:18 AM Nick Allen <n...@nickallen.org> wrote:

> FYI - I opened a ticket to serve as an epic for this work and the feature
> branch.
>
> https://issues.apache.org/jira/browse/METRON-2088
>
> On Mon, Apr 22, 2019 at 3:32 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > +1 to starting a feature branch for this.
> > +1 to removing our custom implementations if the newest revs are in fact
> > stable now.
> >
> > Regarding the profile option - if it's possible to keep 2.6.5 for a bit
> and
> > not require separate branches or code trees, this is probably OK.
> > Otherwise, I'm inclined to take the approach we've taken in the past with
> > every other upgrade and only support 1 version. I think we should prepare
> > users for the likelihood that if/when we cut over, there will be no more
> > updates to 2.6.x.
> >
> > I talked through this a bit with Nick and Ryan Merriman offline. There
> are
> > a number of major version revs of components from HDP 2.6 to 3.x that are
> > likely to have backwards compatibility problems. HBase is a big one that
> > comes to mind - I noticed the HTable interface was deprecated while
> working
> > through the coprocessor implementation, and Ryan found that it was
> removed
> > completely in the new version. That affects our integration tests as well
> > bc we have a rather large mock implementation of HBase in use that is
> built
> > around the removed API. We will either need to migrate to the new API or
> > find alternative approach to integration testing with HBase.
> >
> > I'll let Nick add more detail in the Epic/Jira and feature branch plan,
> but
> > here is a sampling of some of what we can expect to require some work to
> > upgrade:
> >
> >    - Ambari - the current MPack is incompatible with Ambari 2.7.3,
> however
> >    there isn't a breaking changes document, so we'll have to work through
> > this
> >    brute force or hopefully find some help from the Ambari community.
> >    - MaaS - YARN major change
> >    - PCAP - HDFS, Kafka
> >    - Indexing - HDFS, Solr
> >    - All topologies - Kafka
> >    - Stellar - HDFS, HBase
> >    - Enrichment - HBase
> >    - Enrichment Coprocessor (the enrichments listing) - HBase
> >    - Integration tests - Kafka and HBase have changed considerably.
> >    - UI, REST - Solr, HDFS, HBase
> >    - Knox
> >    - Kerberos (hopefully this is a kick-the-tires effort, though there is
> >    some possible risk if Ambari and the individual components introduce
> >    changes here)
> >
> > Fortunately, Zookeeper appears to have stayed the same across versions.
> It
> > might be worthwhile to get a chart of the versions for each platform
> added
> > to the epic Jira for reference while performing this work.
> >
> > Best,
> > Mike
> >
> >
> > On Mon, Apr 22, 2019, 12:50 PM Nick Allen <n...@nickallen.org> wrote:
> >
> > > We currently support running Metron on an HDP 2.6.5
> > > <
> > >
> >
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/comp_versions.html
> > > >
> > > cluster.
> > > I'd like to get Metron updated to run in an HDP 3.1.0
> > > <
> > >
> >
> https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/comp_versions.html
> > > >
> > > cluster.
> > > This provides a number of significant updates to the core platform
> > > components that we depend on like Kafka, HBase, Ambari, etc.
> > >
> > > ### Feature Branch
> > >
> > > I'd like to create a feature branch in which to do this.  This will
> take
> > a
> > > good amount of effort and multiple PRs. To avoid any impact to master
> as
> > we
> > > progress through this, a feature branch would make sense.
> > >
> > > If you have concerns or interest in this effort, please speak up.  Here
> > are
> > > some relevant discussion points based on what I know so far.
> > >
> > > ### CentOS 7
> > >
> > > CentOS 6 RPMs are no longer distributed for HDP 3.1.0, only CentOS 7
> > RPMs.
> > > Because of this we will likely need to transition Full Dev over to
> CentOS
> > > 7.  I don't see a downside to doing this since 6 is rather old and I
> > assume
> > > that most users run variants of 7 already anyways.
> > >
> > > ### HDP 2.6.5
> > >
> > > I'd like to try and make these changes backwards compatible with HDP
> > 2.6.5
> > > if possible, but only as long as that does not increase our ongoing
> > > development burden.
> > >
> > > For example, if I can simply define a separate build profile for 3.1.0
> > and
> > > things are generally backwards compatible, then I'm all for maintaining
> > > support for 2.6.5.  On the other hand, I would not want to go as far as
> > > maintaining separate master branches for each.  In my mind the ongoing
> > cost
> > > there is too high.
> > >
> > > ### HDP 2.5.6
> > >
> > > There are some workaround in the code base that were introduced to
> > support
> > > HDP
> > > 2.5.6
> > > <
> > >
> >
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.6/bk_release-notes/content/comp_versions.html
> > > >
> > > when
> > > we moved to HDP 2.6.5. There are some workarounds specifically for
> older
> > > versions of Storm like 1.0.x. Rather than maintaining this going
> forward,
> > > I'd prefer we remove this technical debt and not support anything older
> > > than HDP 2.6.5.
> > >
> > >
> > >
> > >
> > > Best,
> > > Nick
> > >
> >
>

Reply via email to