Re: benchmark choices

Konstantin Boudnik Tue, 22 Feb 2011 05:30:08 -0800

Adding Roman Shaposhnik to the list who's "tasked" with benchmarking @Cloudera


On Mon, Feb 21, 2011 at 12:39, Shrinivas Joshi <[email protected]> wrote:
> I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo
> etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of
> more interest to Yahoo.
>
> I would appreciate if someone can comment more on this.
>
> Thanks,
> -Shrinivas
>
> On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Boudnik <[email protected]> wrote:
>>
>> On Fri, Feb 18, 2011 at 14:35, Ted Dunning <[email protected]> wrote:
>> > I just read the malstone report.  They report times for a Java version
>> > that
>> > is many (5x) times slower than for a streaming implementation.  That
>> > single
>> > fact indicates that the Java code is so appallingly bad that this is a
>> > very
>> > bad benchmark.
>>
>> Slow Java code? That's funny ;) Running with Hotspot on by any chance?
>>
>> > On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout
>> > <[email protected]>wrote:
>> >
>> >> We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
>> >> data and the queries, if not the query generator. There is a Jira issue
>> >> in
>> >> Hive that discusses the TPC-H "benchmark" if you're interested. Sorry,
>> >> I
>> >> don't remember the issue number offhand.
>> >>
>> >> -----Original Message-----
>> >> From: Shrinivas Joshi [mailto:[email protected]]
>> >> Sent: Friday, February 18, 2011 3:32 PM
>> >> To: [email protected]
>> >> Subject: benchmark choices
>> >>
>> >> Which workloads are used for serious benchmarking of Hadoop clusters?
>> >> Do
>> >> you care about any of the following workloads :
>> >> TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench,
>> >> NNBench,
>> >> sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.
>> >>
>> >> Thanks,
>> >> -Shrinivas
>> >>
>> >>
>> >
>
>

Re: benchmark choices

Reply via email to