Hi,

welcome to the Flink community and thanks for including Flink into your
benchmark suite! That's really exciting news :-)

Most of the jobs that you listed in your preliminary plan are available as
example programs in Flink's code base [1].
However, you should know, that these examples are NOT tuned for performance
but rather for easy understanding and to showcase certain features.

If your implementations of Flink programs are online available (e.g., on
Github) we could assist with some performance tuning.

Thank you,
Fabian

[1]
https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java


2015-07-20 10:19 GMT+02:00 Stephan Ewen <se...@apache.org>:

> Hi!
>
> Thanks for reaching out and adding Flink to BigDataBench.
>
> The plan you sent looks like a nice first draft. It is pretty much batch
> jobs. Here are a few ideas what you could add as batch jobs:
>
>  - Joins are something people seem do a lot with these systems, so a 2-3
> table join would be a nice addition
>
>  - For batch algorithms, it is often interesting to scale them beyond
> memory (we have seen that a lot from users)
>
>  - For graph algorithms, you can try incremental versions (see here:
> http://data-artisans.com/data-analysis-with-flink.html)
>
>
>
> On the streaming side, it is harder, as the systems are very different
> there and bot every system can do everything.
> For Flink, some ideas would be:
>   - Streaming Grep
>   - Streaming pattern detection (see
>
> https://github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine
> )
>   - Streaming word count
>   - For streaming Jobs, it is often interesting to play with enabled /
> disabled fault tolerance
>
>
>
> A few generic comments on Flink, for performance testing.
>
>  - The Java API is usually slightly faster then the Scala API, but only by
> a bit
>  - Tuples (Java) and case classes (Scala) usually beat POJOs in
> performance.
>  - If your implementation allows it, turning on "objectReuseMode()" can
> gain some performance.
>  - If you implement sorting / Tera sort, have a look here, for how to make
> sure that Flink handles the Hadoop types efficiently
>
> http://eastcirclek.blogspot.kr/2015/06/terasort-for-spark-and-flink-with-range.html
>
> Greetings,
> Stephan
>
>
>
> On Mon, Jul 20, 2015 at 9:47 AM, Xinhui Tian <tianxin...@ict.ac.cn> wrote:
>
> > Hello, everyone.
> >
> > I'm a PhD student from the Institute of Computing Technology, Chinese
> > Academy of Sciences. Our team has released a benchmark for big data
> systems
> > called BigDataBench, which has become an industry-standard big data
> > benchmark in China. You can find our work on this website:
> > http://prof.ict.ac.cn/BigDataBench/
> >
> > We are now planning to support Flink in our benchmark, which could
> provide
> > a
> > set of workloads on different domains and an objective comparison with
> > systems such as Spark and Hadoop. But we are new to this system, so we
> are
> > asking for your advice about benchmark design. The first thing is to
> decide
> > what workloads should be added to our benchmark and which domain we
> should
> > pay more attention.
> >
> > The attachment is a preliminary plan, which lists some workloads that
> have
> > already been implemented in the Spark version. We plan to first implement
> > these workloads on Flink, and evalute these two systems. Does anyone have
> > some adivce for this list? We will be very grateful for any idea.
> > BigDataBench_for_Flink.docx
> > <
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/file/n7079/BigDataBench_for_Flink.docx
> > >
> >
> > Thanks ;)
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Benchmarks-of-Flink-supporting-Flink-in-BigDataBench-tp7079.html
> > Sent from the Apache Flink Mailing List archive. mailing list archive at
> > Nabble.com.
> >
>

Reply via email to