[ 
https://issues.apache.org/jira/browse/FLINK-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186743#comment-14186743
 ] 

Mingliang Qi commented on FLINK-1195:
-------------------------------------

I'm currently working on the benchmark as my bachelor thesis on top of 
alexander's [peel|https://github.com/citlab/peel] infrastructure. It configures 
the system and runs experiments automatically. What you need to do is to write 
a xml file that contains things like inputs, dop,output,repetition,command - 
just as you mentioned. And one of his bachelor students is also working on 
generating the plot automatically after the experiments.  

> Improvement of benchmarking infrastructure
> ------------------------------------------
>
>                 Key: FLINK-1195
>                 URL: https://issues.apache.org/jira/browse/FLINK-1195
>             Project: Flink
>          Issue Type: Wish
>            Reporter: Till Rohrmann
>
> I noticed while running my ALS benchmarks that we still have some potential 
> to improve our benchmarking infrastructure. The current state is that we 
> execute the benchmark jobs by writing a script with a single set of 
> parameters. The runtime is then manually retrieved from the web interface of 
> Flink and Spark, respectively.
> I think we need the following extensions:
> * Automatic runtime retrieval and storage in a file
> * Repeated execution of jobs to gather some "advanced" statistics such as 
> mean and standard deviation of the runtimes
> * Support for value sets for the individual parameters
> The automatic runtime retrieval would allow us to execute several benchmarks 
> consecutively without having to lookup the runtimes in the logs or in the web 
> interface, which btw only stores the runtimes of the last 5 jobs.
> What I mean with value sets is that would be nice to specify a set of 
> parameter values for which the benchmark is run without having to write for 
> every single parameter combination a benchmark script. I believe that this 
> feature would become very handy when we want to look at the runtime behaviour 
> of Flink for different input sizes or degrees of parallelism, for example. To 
> illustrate what I mean:
> {code}
> INPUTSIZE = 1000, 2000, 4000, 8000
> DOP = 1, 2, 4, 8
> OUTPUT=benchmarkResults
> repetitions=10
> command=benchmark.jar -p $DOP $INPUTSIZE 
> {code} 
> Something like that would execute the benchmark job with (DOP=1, 
> INPUTSIZE=1000), (DOP=2, INPUTSIZE=2000),.... 10 times each, calculate for 
> each parameter combination runtime statistics and store the results in the 
> file benchmarkResults.
> I believe that spending some effort now will pay off in the long run because 
> we will benchmark Flink continuously. What do you guys think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to