Sure I will be able to take up. Please create tasks with necessary details or let me know if you want me to create.
On Fri, 8 Mar 2024, 12:45 Istvan Toth, <st...@cloudera.com.invalid> wrote: > Thanks for volunteering, Nihal. > > I could work on the Hadoop-less, and assemblies, and you could work on > cleaning up the test jars. > Would that work for you ? > I know that I'm picking the smaller part, but it turns out that I won't > have as much time to work on this as I hoped. > > (Unless there are other volunteers, of course) > > Istvan > > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth <st...@cloudera.com> wrote: > > > We seem to be in agreement in principle, however the devil is in the > > details. > > > > The first step should be moving the diagnostic tools out of the test > jars. > > Are there any tools we don't want to move out ? > > Do the diagnostic tools pull in extra dependencies compared to the > current > > runtime JARs, and if they do, what are those ? > > I haven't thought of the chaosmonkey tests yet, do those have specific > > additional dependencies / scripts ? > > > > Should we move the tools simply to the normal jars, or should we move > them > > to a new module (could be called hbase-diagnostics) ? > > > > Istvan > > > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault < > bbeaudrea...@apache.org> > > wrote: > > > >> I'm +0 on hbase-examples, but +1000000 on any improvements we can make > to > >> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much > reliance > >> we have on test jars both generally but also specifically around these > >> core > >> test executables. Unfortunately I haven't had time to dedicate to these > >> frustrations myself, but happy to help with review, etc. > >> > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain <nihaljain...@gmail.com> > wrote: > >> > >> > Thank you for bringing this up. > >> > > >> > +1 for this change. > >> > > >> > In fact, some time back, we had faced similar problem. Security scans > >> found > >> > that we were bundling some vulnerable hadoop test jar. To deal with > >> that we > >> > had to make a change in our internal HBase fork to exclude all HBase > and > >> > Hadoop test jars from assembly. This helped us get rid of vulnerable > >> jar. > >> > (Although I hadn't dealt with test scope dependencies there.) > >> > > >> > But, I have been thinking of pushing this change in Apache HBase, just > >> > wasn't sure if this was even acceptable. It's great to see same has > been > >> > brought up here today. > >> > > >> > We hadn't dealt with the ltt, pe etc. tools and wrote a script to > >> download > >> > them on demand to avoid massive code change in internal fork. But I > >> have a > >> > +1 on the idea of identifying and moving all such tools to a new > module. > >> > This would be great and make things easier for us as well. > >> > > >> > Also, a way we could help new users easily get started, in case we > >> > completely stop bundling hadoop jars, is by providing a script which > >> starts > >> > a hbase cluster in a single node setup. In fact I had written a simple > >> > script sometime back that automates this process given a release link > >> for > >> > both. It first downloads Hadoop and HBase binaries and then starts > both > >> > with the hbase root directory set to be on hdfs. We could provide > >> something > >> > similar to help new users to get started easily. > >> > > >> > Although I am also +1 on the idea to provide both variants as > mentioned > >> by > >> > Nick, which might not even need any such script. > >> > > >> > Also, I am willing to volunteer for help towards this effort. Please > >> let me > >> > know if anything is needed. > >> > > >> > Thanks, > >> > Nihal > >> > > >> > > >> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, <ndimi...@apache.org> wrote: > >> > > >> > > This would be great cleanup, big +1 from me for all three of these > >> > > adjustments, including the promotion of pe, ltt, and friends out of > >> the > >> > > test scope. > >> > > > >> > > I believe that we included hbase test jars because we used to freely > >> mix > >> > > classes needed for minicluster between runtime and test jars, which > in > >> > turn > >> > > relied on Hadoop minicluster capabilities. The big cleanup around > >> > > HBaseTestingUtil/it addressed much (or all) of these issues on > >> branch-3. > >> > > > >> > > I believe that we include a Hadoop distribution in our assembly > >> because > >> > > that makes it easy for a new user to download our release bin.tgz > and > >> get > >> > > started immediately with learning. I guess it’s high time that we > work > >> > out > >> > > the with- and without-Hadoop variants. > >> > > > >> > > Thanks, > >> > > Nick > >> > > > >> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth <st...@apache.org> wrote: > >> > > > >> > > > DISCLAIMER: I don't have a patch ready, or even an elegant way > >> mapped > >> > out > >> > > > to achieve this, this is about discussing whether we even want to > >> make > >> > > > these changes. > >> > > > These are also substantial changes, but they could be targeted for > >> > HBase > >> > > > 3.0. > >> > > > > >> > > > One issue I have noticed is that we ship test jars and test > >> > dependencies > >> > > in > >> > > > the assembly. > >> > > > I can't see anyone using those, but it bloats the assembly and > >> > classpath, > >> > > > and adds unnecessary JARs with possible CVE issues. (for example > >> Kerby > >> > > > which is a Hadoop minicluster dependency) > >> > > > > >> > > > My proposal is to exclude the test jars and the test scope > >> dependencies > >> > > > from the assembly. > >> > > > > >> > > > The advantages would be: > >> > > > * Smaller distro size > >> > > > * Faster startup (this is marginal) > >> > > > * Less CVE-prone JARs in the binary assemblies > >> > > > > >> > > > The other issue is that the assembly includes much of the Hadoop > >> > > > distribution. > >> > > > The basic assumption in all scripts and instructions is that the > >> node > >> > > has a > >> > > > fully configured Hadoop installation, and we include it in the > >> > classpath > >> > > of > >> > > > HBase. > >> > > > > >> > > > If that is true, then there is no reason to include Hadoop in the > >> > > assembly, > >> > > > HBase and its direct dependencies should be enough. > >> > > > > >> > > > One could argue that it would simplify the client side, which is > >> true > >> > to > >> > > > some extent (though 95% of the client distro use cases are served > >> > better > >> > > by > >> > > > simply using hbase-shaded-client). > >> > > > > >> > > > We could either remove the Hadoop libraries from either or both of > >> the > >> > > > assemblies unconditionally, or provide two variants for either or > >> both > >> > > > assemblies, one with Hadoop included, and one without it. > >> > > > Spark already does this, it has binary distributions both with and > >> > > without > >> > > > Hadoop. > >> > > > > >> > > > The advantages would be: > >> > > > * Smaller distro size > >> > > > * Faster startup (this is marginal) > >> > > > * Less chance of conflicts with the Hadoop jars > >> > > > * Less CVE-prone JARs in the binary assemblies > >> > > > > >> > > > > >> > > > Thirdly, we could consider excluding the > >> > > > full-fat org.apache.hbase:hbase-shaded-client JAR from the > >> Hadoop-less > >> > > > binary assemblies. It is not used by the assembly, and AFAIK it is > >> not > >> > > > included in any of the 'hbase classpath' command variants. > >> > > > > >> > > > This would make sure that no Hadoop libraries are included (even > in > >> > > shaded > >> > > > form) and would make the HBase distribution fully insulated from > >> > Hadoop's > >> > > > CVE issues. > >> > > > > >> > > > (The full-fat hbase-shaded-client works best as direct build-time > >> > > > dependency anyway) > >> > > > > >> > > > best regards > >> > > > Istvan > >> > > > > >> > > > >> > > >> > > > > > > -- > > *István Tóth* | Sr. Staff Software Engineer > > *Email*: st...@cloudera.com > > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > > ------------------------------ > > ------------------------------ > > > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ >