Bottom line up front:
1. The cost of calling 10000 individual REST calls is about two order of
magnitude higher than calling a single batch REST call (10000 * 0.05
seconds vs. 1.4 seconds)
2. Time to complete a batch REST call plateaus at about 10,000 application
reports per call.

Full story:
I experimented and measure how long it takes to fetch Application Reports
from YARN with the REST API. My objective was to compare doing a batch REST
call to get all ApplicationReports vs doing individual REST calls for each
Application Report.

I did the tests on 4 different cluster: 1) a test cluster, 2) a moderately
used dev cluster, 3) a lightly used production cluster, and 4) a heavily
used production cluster. For each cluster I made 7 REST call to get 1, 10,
100, 1000, 10000, 100000, 1000000 application reports respectively. I
repeated each call 200 times to count for variations and I reported the
median time.
To measure the time, I used the following curl command:

$ curl -o /dev/null -s -w "@curl-output-fromat.json" "http://
$rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=$applicationTypes&limit=$limit"

The attached charts show the results. In all the charts, the x axis show
the number of results that were request in the call.
The bar chart show the time it takes to complete a REST call on each
cluster.
The first line plot also shows the same results as the bar chart on a log
scale (it is easier to see that the time to complete the REST call plateaus
at 10,000
The last chart shows the size of data that is being downloaded on each REST
call, which explains why the time plateaus  at 10,000.


[image: transfer_time_bar_plot.png][image: transfer_time_line_plot.png][image:
size_downloaded_line_plot.png]

>
>
Thanks,
Meisam

Reply via email to