Hi,

I'd like to get folks' opinions on having a public cluster for performance
testing Hive code and getting an early read on whether a commit / build has
caused a performance degradation over existing code.

There are already well known workloads available, for example, TPC-DS (
https://github.com/hortonworks/hive-testbench) that can be run so I'm not
talking about performance test code itself (although that should be as
easy as possible on top of a dedicated cluster).

The benefits to the community would be:
   - A dedicated environment, not necessarily leaving it to the vendors
to integrate open-source later into their stacks and only find out some
time later about performance problems
   - Something that can be left set up & running -  no setup and tear-down
   process needed every time a performance run is required
   - An automated process for performance testing - no manual setup or
   intervention

Concerns:
   - Budget
   - Who administers the cluster, ie.. who sets it up, fixes it when down

I'd like to get some opinions on what the process for getting this to
happen would be, bearing in mind that certain things may well be obstacles
(budget) that have to be solved upfront before anything else happens:
   -    Budget approval
   -   Approval / Sign off - how & who?
   -    Architecture / pipeline design
   -   Implementation

Thanks, all opinions welcome.
Eugene

Reply via email to