Re: Performance Testing

Ryan Ausanka-Crues Thu, 21 Jun 2012 16:30:06 -0700

Other ideas that have been thrown around:
- Compile a donated collection of real world datasets that can be used in tests
- Ability to replay WALs: https://issues.apache.org/jira/browse/HBASE-6218
- Find someone to donate a cluster of machines the tests can be run on to 
ensure consistency
- Integrate with iTest/Bigtop


---
Ryan Ausanka-Crues
CEO
Palomino Labs, Inc.
[email protected]
(m) 805.242.2486

On Jun 21, 2012, at 3:05 PM, Matt Corgan wrote:

> just brainstorming =)
> 
> Some of those are motivated by the performance tests i wrote for data block
> encoding: 
> Link<https://github.com/hotpads/hbase-prefix-trie/tree/master/test/org/apache/hadoop/hbase/cell/pt/test/performance/seek>.
> In that directory:
> 
> * SeekBenchmarkMain gathers all of the test parameters.  Perhaps we could
> have a test configuration input file format where standard test configs are
> put in source control
> * For each combination of input parameters it runs a SingleSeekBenchmark
> * As it runs, the SingleSeekBenchmark adds results to a SeekBenchmarkResult
> * Each SeekBenchmarkResult is logged after each SingleSeekBenchmark, and
> all of them are logged again at the end for pasting into a spreadsheet
> 
> They're probably too customized to my use case, but maybe we can draw ideas
> from the structure/workflow and make it applicable to more use cases.
> 
> 
> On Thu, Jun 21, 2012 at 2:47 PM, Andrew Purtell <[email protected]> wrote:
> 
>> Concur. That's ambitious!
>> 
>> On Thu, Jun 21, 2012 at 1:57 PM, Ryan Ausanka-Crues
>> <[email protected]> wrote:
>>> Thanks Matt. These are great!
>>> ---
>>> Ryan Ausanka-Crues
>>> CEO
>>> Palomino Labs, Inc.
>>> [email protected]
>>> (m) 805.242.2486
>>> 
>>> On Jun 21, 2012, at 12:36 PM, Matt Corgan wrote:
>>> 
>>>> These are geared more towards development than regression testing, but
>> here
>>>> are a few ideas that I would find useful:
>>>> 
>>>> * Ability to run the performance tests (or at least a subset of them)
>> on a
>>>> development machine would help people avoid committing regressions and
>>>> would speed development in general
>>>> * Ability to test a single region without heavier weight servers and
>>>> clusters
>>>> * Letting the test run with multiple combinations of input parameters
>>>> (block size, compression, blooms, encoding, flush size, etc, etc).
>>>> Possibly many combinations that could take a while to run
>>>> * Output results to a CSV file that's importable to a spreadsheet for
>>>> sorting/filtering/charting.
>>>> * Email the CSV file to the user notifying them the tests have finished.
>>>> * Getting fancier: ability to specify a list of branches or tags from
>> git
>>>> or subversion as inputs, which would allow the developer to tag many
>>>> different performance changes and later figure out which combination is
>> the
>>>> best (all before submitting a patch)
>>>> 
>>>> 
>>>> On Thu, Jun 21, 2012 at 12:13 PM, Elliott Clark <[email protected]
>>> wrote:
>>>> 
>>>>> I actually think that more measurements are needed than just per
>> release.
>>>>> The best I could hope for would be a four node+ cluster(One master and
>>>>> three slaves) that for every check in on trunk run multiple different
>> perf
>>>>> tests.
>>>>> 
>>>>> 
>>>>> - All Reads (Scans)
>>>>> - Large Writes (Should test compactions/flushes)
>>>>> - Read Dominated with 10% writes
>>>>> 
>>>>> Then every checkin can be evaluated and large regressions can be
>> treated as
>>>>> bugs.  And with that we can see the difference between the different
>>>>> versions as well. http://arewefastyet.com/ is kind of the model that I
>>>>> would love to see.  And I'm more than willing to help where ever
>> needed.
>>>>> 
>>>>> However in reality every night will probably be more feasible.   And
>> Four
>>>>> nodes is probably not going to happen either.
>>>>> 
>>>>> On Thu, Jun 21, 2012 at 11:38 AM, Andrew Purtell <[email protected]
>>>>>> wrote:
>>>>> 
>>>>>> On Wed, Jun 20, 2012 at 10:37 PM, Ryan Ausanka-Crues
>>>>>> <[email protected]> wrote:
>>>>>>> I think it makes sense to start by defining the goals for the
>>>>>> performance testing project and then deciding what we'd like to
>>>>> accomplish.
>>>>>> As such, I start by soliciting ideas from everyone on what they would
>>>>> like
>>>>>> to see from the project. We can then collate those thoughts and
>>>>> prioritize
>>>>>> the different features. Does that sound like a reasonable approach?
>>>>>> 
>>>>>> In terms of defining a goal, the fundamental need I see for us as a
>>>>>> project is to quantify performance from one release to the next, thus
>>>>>> be able to avoid regressions by noting adverse changes in release
>>>>>> candidates.
>>>>>> 
>>>>>> In terms of defining what "performance" means... well, that's an
>>>>>> involved and separate discussion I think.
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>>  - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>>>> Hein (via Tom White)
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein (via Tom White)
>>

Re: Performance Testing

Reply via email to