This is almost done.

The final outstanding patch is https://github.com/apache/hbase/pull/5766
for the new Hadoop-less assembly.

Could you please review it ?



On Sat, Mar 9, 2024 at 8:48 AM Nihal Jain <nihaljain...@gmail.com> wrote:

> I have created sub tasks with necessary details in the umbrella jira. Will
> take them up in coming days. Also will add more sub tasks later if needed.
>
> Regards
> Nihal
>
> On Sat, 9 Mar 2024, 11:53 Istvan Toth, <st...@cloudera.com.invalid> wrote:
>
> > Thank you Nihal.
> > I'm not very familiar with the tools in the test code, so you can
> probably
> > plan that work better.
> > I just have some generic steps in mind:
> > * Identify all the tools / scripts in the test jars
> > * Identify and analyze their dependencies (compared to the current
> runtime
> > deps)
> > * Decide which ones to move to the runtime JARs.
> > * Move them to the runtime code (or perhaps a separate module)
> >
> > I have created https://issues.apache.org/jira/browse/HBASE-28431 as an
> > umbrella ticket to organize the sub-tasks.
> >
> > Istvan
> >
> > On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain <nihaljain...@gmail.com>
> wrote:
> >
> > > Sure I will be able to take up. Please create tasks with necessary
> > details
> > > or let me know if you want me to create.
> > >
> > > On Fri, 8 Mar 2024, 12:45 Istvan Toth, <st...@cloudera.com.invalid>
> > wrote:
> > >
> > > > Thanks for volunteering, Nihal.
> > > >
> > > > I could work on the Hadoop-less, and assemblies, and you could work
> on
> > > > cleaning up the test jars.
> > > > Would that work for you ?
> > > > I know that I'm picking the smaller part, but it turns out that I
> won't
> > > > have as much time to work on this as I hoped.
> > > >
> > > > (Unless there are other volunteers, of course)
> > > >
> > > > Istvan
> > > >
> > > > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth <st...@cloudera.com>
> wrote:
> > > >
> > > > > We seem to be in agreement in principle, however the devil is in
> the
> > > > > details.
> > > > >
> > > > > The first step should be moving the diagnostic tools out of the
> test
> > > > jars.
> > > > > Are there any tools we don't want to move out ?
> > > > > Do the diagnostic tools pull in extra dependencies compared to the
> > > > current
> > > > > runtime JARs, and if they do, what are those ?
> > > > > I haven't thought of the chaosmonkey tests yet, do those have
> > specific
> > > > > additional dependencies / scripts ?
> > > > >
> > > > > Should we move the tools simply to the normal jars, or should we
> move
> > > > them
> > > > > to a new module (could be called hbase-diagnostics) ?
> > > > >
> > > > > Istvan
> > > > >
> > > > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault <
> > > > bbeaudrea...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> I'm +0 on hbase-examples, but +1000000 on any improvements we can
> > make
> > > > to
> > > > >> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much
> > > > reliance
> > > > >> we have on test jars both generally but also specifically around
> > these
> > > > >> core
> > > > >> test executables. Unfortunately I haven't had time to dedicate to
> > > these
> > > > >> frustrations myself, but happy to help with review, etc.
> > > > >>
> > > > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain <nihaljain...@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Thank you for bringing this up.
> > > > >> >
> > > > >> > +1 for this change.
> > > > >> >
> > > > >> > In fact, some time back, we had faced similar problem. Security
> > > scans
> > > > >> found
> > > > >> > that we were bundling some vulnerable hadoop test jar. To deal
> > with
> > > > >> that we
> > > > >> > had to make a change in our internal HBase fork to exclude all
> > HBase
> > > > and
> > > > >> > Hadoop test jars from assembly. This helped us get rid of
> > vulnerable
> > > > >> jar.
> > > > >> > (Although I hadn't dealt with test scope dependencies there.)
> > > > >> >
> > > > >> > But, I have been thinking of pushing this change in Apache
> HBase,
> > > just
> > > > >> > wasn't sure if this was even acceptable. It's great to see same
> > has
> > > > been
> > > > >> > brought up here today.
> > > > >> >
> > > > >> > We hadn't dealt with the ltt, pe etc. tools and wrote a script
> to
> > > > >> download
> > > > >> > them on demand to avoid massive code change in internal fork.
> But
> > I
> > > > >> have a
> > > > >> > +1 on the idea of identifying and moving all such tools to a new
> > > > module.
> > > > >> > This would be great and make things easier for us as well.
> > > > >> >
> > > > >> > Also, a way we could help new users easily get started, in case
> we
> > > > >> > completely stop bundling hadoop jars, is by providing a script
> > which
> > > > >> starts
> > > > >> > a hbase cluster in a single node setup. In fact I had written a
> > > simple
> > > > >> > script sometime back that automates this process given a release
> > > link
> > > > >> for
> > > > >> > both. It first downloads Hadoop and HBase binaries and then
> starts
> > > > both
> > > > >> > with the hbase root directory set to be on hdfs. We could
> provide
> > > > >> something
> > > > >> > similar to help new users to get started easily.
> > > > >> >
> > > > >> > Although I am also +1 on the idea to provide both variants as
> > > > mentioned
> > > > >> by
> > > > >> > Nick, which might not even need any such script.
> > > > >> >
> > > > >> > Also, I am willing to volunteer for help towards this effort.
> > Please
> > > > >> let me
> > > > >> > know if anything is needed.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Nihal
> > > > >> >
> > > > >> >
> > > > >> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, <ndimi...@apache.org>
> > > wrote:
> > > > >> >
> > > > >> > > This would be great cleanup, big +1 from me for all three of
> > these
> > > > >> > > adjustments, including the promotion of pe, ltt, and friends
> out
> > > of
> > > > >> the
> > > > >> > > test scope.
> > > > >> > >
> > > > >> > > I believe that we included hbase test jars because we used to
> > > freely
> > > > >> mix
> > > > >> > > classes needed for minicluster between runtime and test jars,
> > > which
> > > > in
> > > > >> > turn
> > > > >> > > relied on Hadoop minicluster capabilities. The big cleanup
> > around
> > > > >> > > HBaseTestingUtil/it addressed much (or all) of these issues on
> > > > >> branch-3.
> > > > >> > >
> > > > >> > > I believe that we include a Hadoop distribution in our
> assembly
> > > > >> because
> > > > >> > > that makes it easy for a new user to download our release
> > bin.tgz
> > > > and
> > > > >> get
> > > > >> > > started immediately with learning. I guess it’s high time that
> > we
> > > > work
> > > > >> > out
> > > > >> > > the with- and without-Hadoop variants.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Nick
> > > > >> > >
> > > > >> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth <st...@apache.org>
> > > wrote:
> > > > >> > >
> > > > >> > > > DISCLAIMER: I don't have a patch ready, or even an elegant
> way
> > > > >> mapped
> > > > >> > out
> > > > >> > > > to achieve this, this is about discussing whether we even
> want
> > > to
> > > > >> make
> > > > >> > > > these changes.
> > > > >> > > > These are also substantial changes, but they could be
> targeted
> > > for
> > > > >> > HBase
> > > > >> > > > 3.0.
> > > > >> > > >
> > > > >> > > > One issue I have noticed is that we ship test jars and test
> > > > >> > dependencies
> > > > >> > > in
> > > > >> > > > the assembly.
> > > > >> > > > I can't see anyone using those, but it bloats the assembly
> and
> > > > >> > classpath,
> > > > >> > > > and adds unnecessary JARs with possible CVE issues. (for
> > example
> > > > >> Kerby
> > > > >> > > > which is a Hadoop minicluster dependency)
> > > > >> > > >
> > > > >> > > > My proposal is to exclude the test jars and the test scope
> > > > >> dependencies
> > > > >> > > > from the assembly.
> > > > >> > > >
> > > > >> > > > The advantages would be:
> > > > >> > > > * Smaller distro size
> > > > >> > > > * Faster startup (this is marginal)
> > > > >> > > > * Less CVE-prone JARs in the binary assemblies
> > > > >> > > >
> > > > >> > > > The other issue is that the assembly includes much of the
> > Hadoop
> > > > >> > > > distribution.
> > > > >> > > > The basic assumption in all scripts and instructions is that
> > the
> > > > >> node
> > > > >> > > has a
> > > > >> > > > fully configured Hadoop installation, and we include it in
> the
> > > > >> > classpath
> > > > >> > > of
> > > > >> > > > HBase.
> > > > >> > > >
> > > > >> > > > If that is true, then there is no reason to include Hadoop
> in
> > > the
> > > > >> > > assembly,
> > > > >> > > > HBase and its direct dependencies should be enough.
> > > > >> > > >
> > > > >> > > > One could argue that it would simplify the client side,
> which
> > is
> > > > >> true
> > > > >> > to
> > > > >> > > > some extent (though 95% of the client distro use cases are
> > > served
> > > > >> > better
> > > > >> > > by
> > > > >> > > > simply using hbase-shaded-client).
> > > > >> > > >
> > > > >> > > > We could either remove the Hadoop libraries from either or
> > both
> > > of
> > > > >> the
> > > > >> > > > assemblies unconditionally, or provide two variants for
> either
> > > or
> > > > >> both
> > > > >> > > > assemblies, one with Hadoop included, and one without it.
> > > > >> > > > Spark already does this, it has binary distributions both
> with
> > > and
> > > > >> > > without
> > > > >> > > > Hadoop.
> > > > >> > > >
> > > > >> > > > The advantages would be:
> > > > >> > > > * Smaller distro size
> > > > >> > > > * Faster startup (this is marginal)
> > > > >> > > > * Less chance of conflicts with the Hadoop jars
> > > > >> > > > * Less CVE-prone JARs in the binary assemblies
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Thirdly, we could consider excluding the
> > > > >> > > > full-fat org.apache.hbase:hbase-shaded-client JAR from the
> > > > >> Hadoop-less
> > > > >> > > > binary assemblies. It is not used by the assembly, and AFAIK
> > it
> > > is
> > > > >> not
> > > > >> > > > included in any of the 'hbase classpath' command variants.
> > > > >> > > >
> > > > >> > > > This would make sure that no Hadoop libraries are included
> > (even
> > > > in
> > > > >> > > shaded
> > > > >> > > > form) and would make the HBase distribution fully insulated
> > from
> > > > >> > Hadoop's
> > > > >> > > > CVE issues.
> > > > >> > > >
> > > > >> > > > (The full-fat hbase-shaded-client works best as direct
> > > build-time
> > > > >> > > > dependency anyway)
> > > > >> > > >
> > > > >> > > > best regards
> > > > >> > > > Istvan
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > *Email*: st...@cloudera.com
> > > > > cloudera.com <https://www.cloudera.com>
> > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > > > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > ------------------------------
> > > > > ------------------------------
> > > > >
> > > >
> > > >
> > > > --
> > > > *István Tóth* | Sr. Staff Software Engineer
> > > > *Email*: st...@cloudera.com
> > > > cloudera.com <https://www.cloudera.com>
> > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > Cloudera
> > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > ------------------------------
> > > > ------------------------------
> > > >
> > >
> >
> >
> > --
> > *István Tóth* | Sr. Staff Software Engineer
> > *Email*: st...@cloudera.com
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> > ------------------------------
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to