Re: [DISCUSS] Macro-benchmarking for performance tuning and regression detection

Stephan Ewen Tue, 03 May 2016 02:42:08 -0700

Hi Greg!

The idea is very good, especially having these pre-built performance tests
for release testing.


In your opinion, are the tests going to be self-contained, or will they
need a cluster (YARN, Mesos, Docker, etc.)
to bring up a Flink cluster and  run things?

Greetings,
Stephan




On Sat, Apr 9, 2016 at 12:41 PM, Gábor Gévay <gga...@gmail.com> wrote:

> Hello,
>
> I think that creating a macro-benchmarking module would be a very good
> idea. It would make doing performance-related changes much easier and
> safer.
>
> I have also used Peel, and can confirm that it would be a good fit for
> this task.
>
> > I've also been looking recently at some of the hot code and see about a
> > ~12-14% total improvement when modifying NormalizedKeySorter.compare/swap
> > to bitshift and bitmask rather than divide and modulo. The trade-off is
> > that to align on a power-of-2 we have holes in and require additional
> > MemoryBuffers.
>
> I've also noticed the performance problem that those divisons in
> NormalizedKeySorter.compare/swap cause, and have an idea about
> eliminating them without the aligning to power-of-2 trade-off. I've
> opened a Jira [1], where I explain it.
>
> Best,
> Gábor
>
> [1] https://issues.apache.org/jira/browse/FLINK-3722
>
>
>
>
> 2016-04-06 18:56 GMT+02:00 Greg Hogan <c...@greghogan.com>:
> > I'd like to discuss the creation of a macro-benchmarking module for
> Flink.
> > This could be run during pre-release testing to detect performance
> > regressions and during development when refactoring or performance tuning
> > code on the hot path.
> >
> > Many users have published benchmarks and the Flink libraries already
> > contain a modest selection of algorithms. Some benefits of creating a
> > consolidated collection of macro-benchmarks include:
> >
> > - comprehensive code coverage: a diverse set of algorithms can stress
> every
> > aspect of Flink (streaming, batch, sorts, joins, spilling, cluster, ...)
> >
> > - codify best practices: benchmarks should be relatively stable and
> > repeatable
> >
> > - efficient: an automated system can run many more tests and generate
> more
> > accurate results
> >
> > Macro-benchmarks would be useful in analyzing improved performance with
> the
> > proposed specialized serializes and comparators [FLINK-3599] or making
> > Flink NUMA-aware [FLINK-3163].
> >
> > I've also been looking recently at some of the hot code and see about a
> > ~12-14% total improvement when modifying NormalizedKeySorter.compare/swap
> > to bitshift and bitmask rather than divide and modulo. The trade-off is
> > that to align on a power-of-2 we have holes in and require additional
> > MemoryBuffers. And I'm testing on a single data type, IntValue, and there
> > may be different results for LongValue or StringValue or custom types or
> > with different algorithms. And replacing multiply with a left shift
> reduces
> > performance, demonstrating the need to test changes in isolation.
> >
> > There are many more ideas, i.e. NormalizedKeySorter writing keys before
> the
> > pointer so that the offset computation is performed outside of the
> compare
> > and sort methods. Also, SpanningRecordSerializer could skip to the next
> > buffer rather than writing length across buffers. These changes might
> each
> > be worth a few percent. Other changes might be less than a 1% speedup,
> but
> > taken in aggregate will yield a noticeable performance increase.
> >
> > I like the idea of profile first, measure second, then create and discuss
> > the pull request.
> >
> > As for the actual macro-benchmarking framework, it would be nice if the
> > algorithms would also verify correctness alongside performance. The
> > algorithm interface would be warmup (run only once) and execute, which
> > would be run multiple times in an interleaved manner. There benchmarking
> > duration should be tunable.
> >
> > The framework would be responsible for configuration of as well as
> starting
> > and stopping the cluster, executing algorithms and recording performance,
> > and comparing and analyzing results.
> >
> > Greg
>

Re: [DISCUSS] Macro-benchmarking for performance tuning and regression detection

Reply via email to