Thanks for the suggestion, but that is not realistic, HBase and Phoenix are
WAY too tightly coupled for that.
The Phoenix client JVM doesn't do that much, the majority of work and
interesting stuff happens in the HBase cluster.

There is a subset of tests which CAN be run on separate clusters, and there
has been recent work done to improve on that.
Your suggestion of using Docker would make sense for those, for example it
could be used for testing interoperability of different Phoenix client and
server
versions, and upgrades (though we already have a - somewhat clumsy - test
framework for that)

CLDR does have an internal test suite running on real clusters, 90% effort
and time is spent on configuring the clusters and re-starting them.
It is a good fit for some tests, mostly involving larger amounts of data,
and non-functional tests like using cloud connectors, but a bad fit for
functional / integration tests.
A similar docker-based system could probably improve on the configuration
complexity and restart times, but could still not be as fast as the
very stripped down minicluster.

IUC SFDC is also working on something similar, but I don't know their exact
architecture for their internal tests.

A significant amount of tests require very specifically configured HBase
instances with custom code and internal Hbase state management, which would
- take a huge amount of time and resources to start and run (as opposed to
the VERY optimized miniclusters)
- require bridging the different environments to manipulate and verify the
internal private state of the HBase cluster.
- require implementing a separate configuration management framework for
the stack. (something like Ambari)

Phoenix already needs to be built for specific HBase versions, with each
branch only supporting a limited number of
HBase versions, for which we can run the tests (and we are already bad at
keeping the tests stable, even for our current coverage).

On the other hand, Phoenix uses very few Hadoop features directly, it
mostly goes through HBase APIs.
All the heavy lifting for Hadoop compatibility is done in HBase, there is
very little to test from Phoenix itself in that regard.

In short, this would take amount of resources - mostly in developers, but
also in computing resources for tests - that project does not have, for a
rather limited payoff.

Istvan

On Tue, Sep 10, 2024 at 11:17 AM Grzegorz Kokosiński <g.kokosin...@gmail.com>
wrote:

> Hey,
>
> I am sorry I have joined the discussion late.
>
> If possible I would suggest to stop using mini clusters. I believe it is
> way more convenient to use docker with hadoop or hbase services. Notice
> that it is easy to test against multiple different versions, the
> environment is separated from the project classpath. In Trino that approach
> made testing easier.
>
> That way we could verify compatibility with old software even if we would
> be using newer libraries (hadoop and others).
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to