Yes, we've discussed this issue, but deferred solving it.

IMO the easiest way is to add a third assembly which is functionally
equivalent to the 2.x assembly.
That could work for running hbase-it, ITBLL, chaos monkey, etc.
We'd also have to decide whether to build it by default, and whether we
want to publish it as part of official releases.

In theory could also make a delta assembly that only includes the
additional test related stuff, but I'm afraid that that would require a lot
of maintenance.
We could also add a script/maven target that downloads test-related JARs
from maven, but keeping that one up to date would also be problematic.

Istvan

On Sat, Mar 15, 2025 at 4:51 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> After this change, we can not run ITBLL on 3.0.0 because hbase-it is
> also excluded...
>
> I tried manually copying all the tests jar and hbase-it jar to the lib
> directory but it did not work, I guess we still missed several hadoop
> jars...
>
> So what is the suggested way to run ITBLL after this change?
>
> Thanks.
>
> Istvan Toth <st...@cloudera.com.invalid> 于2025年1月20日周一 14:20写道:
> >
> > This is almost done.
> >
> > The final outstanding patch is https://github.com/apache/hbase/pull/5766
> > for the new Hadoop-less assembly.
> >
> > Could you please review it ?
> >
> >
> >
> > On Sat, Mar 9, 2024 at 8:48 AM Nihal Jain <nihaljain...@gmail.com>
> wrote:
> >
> > > I have created sub tasks with necessary details in the umbrella jira.
> Will
> > > take them up in coming days. Also will add more sub tasks later if
> needed.
> > >
> > > Regards
> > > Nihal
> > >
> > > On Sat, 9 Mar 2024, 11:53 Istvan Toth, <st...@cloudera.com.invalid>
> wrote:
> > >
> > > > Thank you Nihal.
> > > > I'm not very familiar with the tools in the test code, so you can
> > > probably
> > > > plan that work better.
> > > > I just have some generic steps in mind:
> > > > * Identify all the tools / scripts in the test jars
> > > > * Identify and analyze their dependencies (compared to the current
> > > runtime
> > > > deps)
> > > > * Decide which ones to move to the runtime JARs.
> > > > * Move them to the runtime code (or perhaps a separate module)
> > > >
> > > > I have created https://issues.apache.org/jira/browse/HBASE-28431 as
> an
> > > > umbrella ticket to organize the sub-tasks.
> > > >
> > > > Istvan
> > > >
> > > > On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain <nihaljain...@gmail.com>
> > > wrote:
> > > >
> > > > > Sure I will be able to take up. Please create tasks with necessary
> > > > details
> > > > > or let me know if you want me to create.
> > > > >
> > > > > On Fri, 8 Mar 2024, 12:45 Istvan Toth, <st...@cloudera.com.invalid
> >
> > > > wrote:
> > > > >
> > > > > > Thanks for volunteering, Nihal.
> > > > > >
> > > > > > I could work on the Hadoop-less, and assemblies, and you could
> work
> > > on
> > > > > > cleaning up the test jars.
> > > > > > Would that work for you ?
> > > > > > I know that I'm picking the smaller part, but it turns out that I
> > > won't
> > > > > > have as much time to work on this as I hoped.
> > > > > >
> > > > > > (Unless there are other volunteers, of course)
> > > > > >
> > > > > > Istvan
> > > > > >
> > > > > > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth <st...@cloudera.com>
> > > wrote:
> > > > > >
> > > > > > > We seem to be in agreement in principle, however the devil is
> in
> > > the
> > > > > > > details.
> > > > > > >
> > > > > > > The first step should be moving the diagnostic tools out of the
> > > test
> > > > > > jars.
> > > > > > > Are there any tools we don't want to move out ?
> > > > > > > Do the diagnostic tools pull in extra dependencies compared to
> the
> > > > > > current
> > > > > > > runtime JARs, and if they do, what are those ?
> > > > > > > I haven't thought of the chaosmonkey tests yet, do those have
> > > > specific
> > > > > > > additional dependencies / scripts ?
> > > > > > >
> > > > > > > Should we move the tools simply to the normal jars, or should
> we
> > > move
> > > > > > them
> > > > > > > to a new module (could be called hbase-diagnostics) ?
> > > > > > >
> > > > > > > Istvan
> > > > > > >
> > > > > > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault <
> > > > > > bbeaudrea...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> I'm +0 on hbase-examples, but +1000000 on any improvements we
> can
> > > > make
> > > > > > to
> > > > > > >> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how
> much
> > > > > > reliance
> > > > > > >> we have on test jars both generally but also specifically
> around
> > > > these
> > > > > > >> core
> > > > > > >> test executables. Unfortunately I haven't had time to
> dedicate to
> > > > > these
> > > > > > >> frustrations myself, but happy to help with review, etc.
> > > > > > >>
> > > > > > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain <
> nihaljain...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Thank you for bringing this up.
> > > > > > >> >
> > > > > > >> > +1 for this change.
> > > > > > >> >
> > > > > > >> > In fact, some time back, we had faced similar problem.
> Security
> > > > > scans
> > > > > > >> found
> > > > > > >> > that we were bundling some vulnerable hadoop test jar. To
> deal
> > > > with
> > > > > > >> that we
> > > > > > >> > had to make a change in our internal HBase fork to exclude
> all
> > > > HBase
> > > > > > and
> > > > > > >> > Hadoop test jars from assembly. This helped us get rid of
> > > > vulnerable
> > > > > > >> jar.
> > > > > > >> > (Although I hadn't dealt with test scope dependencies
> there.)
> > > > > > >> >
> > > > > > >> > But, I have been thinking of pushing this change in Apache
> > > HBase,
> > > > > just
> > > > > > >> > wasn't sure if this was even acceptable. It's great to see
> same
> > > > has
> > > > > > been
> > > > > > >> > brought up here today.
> > > > > > >> >
> > > > > > >> > We hadn't dealt with the ltt, pe etc. tools and wrote a
> script
> > > to
> > > > > > >> download
> > > > > > >> > them on demand to avoid massive code change in internal
> fork.
> > > But
> > > > I
> > > > > > >> have a
> > > > > > >> > +1 on the idea of identifying and moving all such tools to
> a new
> > > > > > module.
> > > > > > >> > This would be great and make things easier for us as well.
> > > > > > >> >
> > > > > > >> > Also, a way we could help new users easily get started, in
> case
> > > we
> > > > > > >> > completely stop bundling hadoop jars, is by providing a
> script
> > > > which
> > > > > > >> starts
> > > > > > >> > a hbase cluster in a single node setup. In fact I had
> written a
> > > > > simple
> > > > > > >> > script sometime back that automates this process given a
> release
> > > > > link
> > > > > > >> for
> > > > > > >> > both. It first downloads Hadoop and HBase binaries and then
> > > starts
> > > > > > both
> > > > > > >> > with the hbase root directory set to be on hdfs. We could
> > > provide
> > > > > > >> something
> > > > > > >> > similar to help new users to get started easily.
> > > > > > >> >
> > > > > > >> > Although I am also +1 on the idea to provide both variants
> as
> > > > > > mentioned
> > > > > > >> by
> > > > > > >> > Nick, which might not even need any such script.
> > > > > > >> >
> > > > > > >> > Also, I am willing to volunteer for help towards this
> effort.
> > > > Please
> > > > > > >> let me
> > > > > > >> > know if anything is needed.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Nihal
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, <
> ndimi...@apache.org>
> > > > > wrote:
> > > > > > >> >
> > > > > > >> > > This would be great cleanup, big +1 from me for all three
> of
> > > > these
> > > > > > >> > > adjustments, including the promotion of pe, ltt, and
> friends
> > > out
> > > > > of
> > > > > > >> the
> > > > > > >> > > test scope.
> > > > > > >> > >
> > > > > > >> > > I believe that we included hbase test jars because we
> used to
> > > > > freely
> > > > > > >> mix
> > > > > > >> > > classes needed for minicluster between runtime and test
> jars,
> > > > > which
> > > > > > in
> > > > > > >> > turn
> > > > > > >> > > relied on Hadoop minicluster capabilities. The big cleanup
> > > > around
> > > > > > >> > > HBaseTestingUtil/it addressed much (or all) of these
> issues on
> > > > > > >> branch-3.
> > > > > > >> > >
> > > > > > >> > > I believe that we include a Hadoop distribution in our
> > > assembly
> > > > > > >> because
> > > > > > >> > > that makes it easy for a new user to download our release
> > > > bin.tgz
> > > > > > and
> > > > > > >> get
> > > > > > >> > > started immediately with learning. I guess it’s high time
> that
> > > > we
> > > > > > work
> > > > > > >> > out
> > > > > > >> > > the with- and without-Hadoop variants.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Nick
> > > > > > >> > >
> > > > > > >> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth <
> st...@apache.org>
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > > DISCLAIMER: I don't have a patch ready, or even an
> elegant
> > > way
> > > > > > >> mapped
> > > > > > >> > out
> > > > > > >> > > > to achieve this, this is about discussing whether we
> even
> > > want
> > > > > to
> > > > > > >> make
> > > > > > >> > > > these changes.
> > > > > > >> > > > These are also substantial changes, but they could be
> > > targeted
> > > > > for
> > > > > > >> > HBase
> > > > > > >> > > > 3.0.
> > > > > > >> > > >
> > > > > > >> > > > One issue I have noticed is that we ship test jars and
> test
> > > > > > >> > dependencies
> > > > > > >> > > in
> > > > > > >> > > > the assembly.
> > > > > > >> > > > I can't see anyone using those, but it bloats the
> assembly
> > > and
> > > > > > >> > classpath,
> > > > > > >> > > > and adds unnecessary JARs with possible CVE issues. (for
> > > > example
> > > > > > >> Kerby
> > > > > > >> > > > which is a Hadoop minicluster dependency)
> > > > > > >> > > >
> > > > > > >> > > > My proposal is to exclude the test jars and the test
> scope
> > > > > > >> dependencies
> > > > > > >> > > > from the assembly.
> > > > > > >> > > >
> > > > > > >> > > > The advantages would be:
> > > > > > >> > > > * Smaller distro size
> > > > > > >> > > > * Faster startup (this is marginal)
> > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies
> > > > > > >> > > >
> > > > > > >> > > > The other issue is that the assembly includes much of
> the
> > > > Hadoop
> > > > > > >> > > > distribution.
> > > > > > >> > > > The basic assumption in all scripts and instructions is
> that
> > > > the
> > > > > > >> node
> > > > > > >> > > has a
> > > > > > >> > > > fully configured Hadoop installation, and we include it
> in
> > > the
> > > > > > >> > classpath
> > > > > > >> > > of
> > > > > > >> > > > HBase.
> > > > > > >> > > >
> > > > > > >> > > > If that is true, then there is no reason to include
> Hadoop
> > > in
> > > > > the
> > > > > > >> > > assembly,
> > > > > > >> > > > HBase and its direct dependencies should be enough.
> > > > > > >> > > >
> > > > > > >> > > > One could argue that it would simplify the client side,
> > > which
> > > > is
> > > > > > >> true
> > > > > > >> > to
> > > > > > >> > > > some extent (though 95% of the client distro use cases
> are
> > > > > served
> > > > > > >> > better
> > > > > > >> > > by
> > > > > > >> > > > simply using hbase-shaded-client).
> > > > > > >> > > >
> > > > > > >> > > > We could either remove the Hadoop libraries from either
> or
> > > > both
> > > > > of
> > > > > > >> the
> > > > > > >> > > > assemblies unconditionally, or provide two variants for
> > > either
> > > > > or
> > > > > > >> both
> > > > > > >> > > > assemblies, one with Hadoop included, and one without
> it.
> > > > > > >> > > > Spark already does this, it has binary distributions
> both
> > > with
> > > > > and
> > > > > > >> > > without
> > > > > > >> > > > Hadoop.
> > > > > > >> > > >
> > > > > > >> > > > The advantages would be:
> > > > > > >> > > > * Smaller distro size
> > > > > > >> > > > * Faster startup (this is marginal)
> > > > > > >> > > > * Less chance of conflicts with the Hadoop jars
> > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Thirdly, we could consider excluding the
> > > > > > >> > > > full-fat org.apache.hbase:hbase-shaded-client JAR from
> the
> > > > > > >> Hadoop-less
> > > > > > >> > > > binary assemblies. It is not used by the assembly, and
> AFAIK
> > > > it
> > > > > is
> > > > > > >> not
> > > > > > >> > > > included in any of the 'hbase classpath' command
> variants.
> > > > > > >> > > >
> > > > > > >> > > > This would make sure that no Hadoop libraries are
> included
> > > > (even
> > > > > > in
> > > > > > >> > > shaded
> > > > > > >> > > > form) and would make the HBase distribution fully
> insulated
> > > > from
> > > > > > >> > Hadoop's
> > > > > > >> > > > CVE issues.
> > > > > > >> > > >
> > > > > > >> > > > (The full-fat hbase-shaded-client works best as direct
> > > > > build-time
> > > > > > >> > > > dependency anyway)
> > > > > > >> > > >
> > > > > > >> > > > best regards
> > > > > > >> > > > Istvan
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > > > *Email*: st...@cloudera.com
> > > > > > > cloudera.com <https://www.cloudera.com>
> > > > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> > > [image:
> > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> [image:
> > > > > > > Cloudera on LinkedIn] <
> https://www.linkedin.com/company/cloudera>
> > > > > > > ------------------------------
> > > > > > > ------------------------------
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > > *Email*: st...@cloudera.com
> > > > > > cloudera.com <https://www.cloudera.com>
> > > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> [image:
> > > > > Cloudera
> > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > > ------------------------------
> > > > > > ------------------------------
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > *István Tóth* | Sr. Staff Software Engineer
> > > > *Email*: st...@cloudera.com
> > > > cloudera.com <https://www.cloudera.com>
> > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > Cloudera
> > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > ------------------------------
> > > > ------------------------------
> > > >
> > >
> >
> >
> > --
> > *István Tóth* | Sr. Staff Software Engineer
> > *Email*: st...@cloudera.com
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> > ------------------------------
>

Reply via email to