Thanks for that, Stamatis. Plenty of food for thought there. What would you think of the best way of getting sponsors on board - when they read/contribute here, for example?
>From the list of requirements to start a VM, the following could be used as part of the process, I imagine: Maintainers: "Provide the name, Apache ID, and contact info for at least three PMC members who will maintain the vm " - read “maintain cluster” here or perhaps this would be the sponsor On Tue, May 21, 2024 at 1:36 PM Stamatis Zampetakis <zabe...@gmail.com> wrote: > Hey Eugene, > > Having a cluster for performance testing is a great idea and it is > something that has popped up in various contexts. > > The most common way to obtain such clusters is via sponsors (companies > or individuals) donating resources to the project. For example, the > Hive CI is now running mostly on resources donated by Cloudera. > > There seems to be a process about requesting resources from the Apache > Infra team [1] but I am not aware of other ASF projects following this > path for performance testing. Most likely the easiest and fastest way > to move this forward is through a sponsor. Depending on where the > resources come from will also determine the design, implementation, > and maintenance. > > Best, > Stamatis > > [1] https://infra.apache.org/vm-for-project.html > > On Tue, May 21, 2024 at 11:25 AM Eugene Ryan <ryan.eug...@gmail.com> > wrote: > > > > Hi, > > > > I'd like to get folks' opinions on having a public cluster for > performance > > testing Hive code and getting an early read on whether a commit / build > has > > caused a performance degradation over existing code. > > > > There are already well known workloads available, for example, TPC-DS ( > https://github.com/hortonworks/hive-testbench) that can be run so I'm not > talking about performance test code itself (although that should be as easy > as possible on top of a dedicated cluster). > > > > The benefits to the community would be: > > - A dedicated environment, not necessarily leaving it to the vendors > to integrate open-source later into their stacks and only find out some > time later about performance problems > > - Something that can be left set up & running - no setup and > tear-down > > process needed every time a performance run is required > > - An automated process for performance testing - no manual setup or > > intervention > > > > Concerns: > > - Budget > > - Who administers the cluster, ie.. who sets it up, fixes it when down > > > > I'd like to get some opinions on what the process for getting this to > > happen would be, bearing in mind that certain things may well be > obstacles (budget) that have to be solved upfront before anything else > happens: > > - Budget approval > > - Approval / Sign off - how & who? > > - Architecture / pipeline design > > - Implementation > > > > Thanks, all opinions welcome. > > Eugene > > > -- Eugene