[ 
https://issues.apache.org/jira/browse/FLINK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611879#comment-16611879
 ] 

Piotr Nowojski commented on FLINK-10320:
----------------------------------------

[~Tison] thanks for the proposal. Is the scheduling time a problem? Is it time 
critical thing? I guess that at least in most cases/scenarios/setups no. Did we 
have some performance regression there? I'm asking because more benchmarks is 
more code to maintain and more time to execute them.

My main concern regarding implementation are those mock/testing 
{{TaskExecutor}}. It would be best to either reuse some existing testing code 
or just setup the real ones, but if we want to benchmark scheduling for setups 
with hundreds of TM, they have to be super quick/efficient, other way we would 
overload the machine executing the benchmark.

> Introduce JobMaster schedule micro-benchmark
> --------------------------------------------
>
>                 Key: FLINK-10320
>                 URL: https://issues.apache.org/jira/browse/FLINK-10320
>             Project: Flink
>          Issue Type: Improvement
>          Components: Tests
>            Reporter: 陈梓立
>            Assignee: 陈梓立
>            Priority: Major
>
> Based on {{org.apache.flink.streaming.runtime.io.benchmark}} stuff and the 
> repo [flink-benchmark|https://github.com/dataArtisans/flink-benchmarks], I 
> proposal to introduce another micro-benchmark which focuses on {{JobMaster}} 
> schedule performance
> h3. Target
> Benchmark how long from {{JobMaster}} startup(receive the {{JobGraph}} and 
> init) to all tasks RUNNING. Technically we use bounded stream and TM finishes 
> tasks as soon as they arrived. So the real interval we measure is to all 
> tasks FINISHED.
> h3. Case
> 1. JobGraph that cover EAGER + PIPELINED edges
> 2. JobGraph that cover LAZY_FROM_SOURCES + PIPELINED edges
> 3. JobGraph that cover LAZY_FROM_SOURCES + BLOCKING edges
> ps: maybe benchmark if the source is get from {{InputSplit}}?
> h3. Implement
> Based on the flink-benchmark repo, we finally run benchmark using jmh. So the 
> whole test suit is separated into two repos. The testing environment could be 
> located in the main repo, maybe under 
> flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/benchmark.
> To measure the performance of {{JobMaster}} scheduling, we need to simulate 
> an environment that:
> 1. has a real {{JobMaster}}
> 2. has a mock/testing {{ResourceManager}} that having infinite resource and 
> react immediately.
> 3. has a(many?) mock/testing {{TaskExecutor}} that deploy and finish tasks 
> immediately.
> [~trohrm...@apache.org] [~GJL] [~pnowojski] could you please review this 
> proposal to help clarify the goal and concrete details? Thanks in advance.
> Any suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to