Hi, I'd like to get folks' opinions on having a public cluster for performance testing Hive code and getting an early read on whether a commit / build has caused a performance degradation over existing code.
There are already well known workloads available, for example, TPC-DS ( https://github.com/hortonworks/hive-testbench) that can be run so I'm not talking about performance test code itself (although that should be as easy as possible on top of a dedicated cluster). The benefits to the community would be: - A dedicated environment, not necessarily leaving it to the vendors to integrate open-source later into their stacks and only find out some time later about performance problems - Something that can be left set up & running - no setup and tear-down process needed every time a performance run is required - An automated process for performance testing - no manual setup or intervention Concerns: - Budget - Who administers the cluster, ie.. who sets it up, fixes it when down I'd like to get some opinions on what the process for getting this to happen would be, bearing in mind that certain things may well be obstacles (budget) that have to be solved upfront before anything else happens: - Budget approval - Approval / Sign off - how & who? - Architecture / pipeline design - Implementation Thanks, all opinions welcome. Eugene