Thanks for the suggestion, but that is not realistic, HBase and Phoenix are WAY too tightly coupled for that. The Phoenix client JVM doesn't do that much, the majority of work and interesting stuff happens in the HBase cluster.
There is a subset of tests which CAN be run on separate clusters, and there has been recent work done to improve on that. Your suggestion of using Docker would make sense for those, for example it could be used for testing interoperability of different Phoenix client and server versions, and upgrades (though we already have a - somewhat clumsy - test framework for that) CLDR does have an internal test suite running on real clusters, 90% effort and time is spent on configuring the clusters and re-starting them. It is a good fit for some tests, mostly involving larger amounts of data, and non-functional tests like using cloud connectors, but a bad fit for functional / integration tests. A similar docker-based system could probably improve on the configuration complexity and restart times, but could still not be as fast as the very stripped down minicluster. IUC SFDC is also working on something similar, but I don't know their exact architecture for their internal tests. A significant amount of tests require very specifically configured HBase instances with custom code and internal Hbase state management, which would - take a huge amount of time and resources to start and run (as opposed to the VERY optimized miniclusters) - require bridging the different environments to manipulate and verify the internal private state of the HBase cluster. - require implementing a separate configuration management framework for the stack. (something like Ambari) Phoenix already needs to be built for specific HBase versions, with each branch only supporting a limited number of HBase versions, for which we can run the tests (and we are already bad at keeping the tests stable, even for our current coverage). On the other hand, Phoenix uses very few Hadoop features directly, it mostly goes through HBase APIs. All the heavy lifting for Hadoop compatibility is done in HBase, there is very little to test from Phoenix itself in that regard. In short, this would take amount of resources - mostly in developers, but also in computing resources for tests - that project does not have, for a rather limited payoff. Istvan On Tue, Sep 10, 2024 at 11:17 AM Grzegorz Kokosiński <g.kokosin...@gmail.com> wrote: > Hey, > > I am sorry I have joined the discussion late. > > If possible I would suggest to stop using mini clusters. I believe it is > way more convenient to use docker with hadoop or hbase services. Notice > that it is easy to test against multiple different versions, the > environment is separated from the project classpath. In Trino that approach > made testing easier. > > That way we could verify compatibility with old software even if we would > be using newer libraries (hadoop and others). > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------