[ 
https://issues.apache.org/jira/browse/MESOS-10026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970594#comment-16970594
 ] 

Benjamin Mahler commented on MESOS-10026:
-----------------------------------------

Some preliminary numbers from a prototype 
https://github.com/bmahler/mesos/tree/bmahler_v1_operator_api_read_performance

{noformat}
Before:
v0 '/state' response took 6.549942141secs
v1 'master::call::GetState' application/x-protobuf response took 
24.081624381secs
v1 'master::call::GetState' application/json response took 22.760332466secs
{noformat}
{noformat}
After:
v0 '/state' response took 7.57313099secs
v1 'master::call::GetState' application/x-protobuf response took 5.240223816secs
v1 'master::call::GetState' application/json response took 1.76133347258333mins
{noformat}

However, as you can see, it turns out protobuf’s built-in json conversion is 
extremely slow at least for going from serialized protobuf to serialized json 
(I haven’t run perf to see why). This means we can’t really use the built-in 
json facilities (see MESOS-9896), and we have to have two code paths, one doing 
direct protobuf serialization and one doing direct json serialization via 
jsonify. I implemented that and got the following:

{noformat}
After:
v0 '/state' response took 7.743768168secs
v1 'master::call::GetState' application/x-protobuf response took 5.640594663secs
v1 'master::call::GetState' application/json response took 11.795411549secs
{noformat}

> Improve v1 operator API read performance.
> -----------------------------------------
>
>                 Key: MESOS-10026
>                 URL: https://issues.apache.org/jira/browse/MESOS-10026
>             Project: Mesos
>          Issue Type: Improvement
>          Components: HTTP API
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>            Priority: Major
>              Labels: foundations
>
> Currently, the v1 operator API has poor performance relative to the v0 json 
> API. The following initial numbers were provided by [~Will Mahler] from our 
> state serving benchmark:
>  
> |OPTIMIZED - Master (baseline)| | | | |
> |Test setup|1000 agents with a total of 10000 running tasks and 10000 
> completed tasks|10000 agents with a total of 100000 running tasks and 100000 
> completed tasks|20000 agents with a total of 200000 running tasks and 200000 
> completed tasks|40000 agents with a total of 400000 running tasks and 400000 
> completed tasks|
> |v0 'state' response|0.17|1.66|8.96|12.42|
> |v1 x-protobuf|0.35|3.21|9.47|19.09|
> |v1 json|0.45|4.72|10.81|31.43|
> There is quite a lot of variance, but v1 protobuf consistently slower than v0 
> (sometimes significantly so) and v1 json is consistently slower than v1 
> protobuf (sometimes significantly so).
> The reason that the v1 operator API is slower is that it does the following:
> (1) Construct temporary unversioned state response object by copying 
> in-memory un-versioned state into overall response object. (expensive!)
> (2) Evolve it to v1: serialize, de-serialize into v1 overall state object. 
> (expensive!)
> (3) Serialize the overall v1 state object to protobuf or json.
> (4) Destruct the temporaries (expensive! but is done after response starts 
> serving)
> On the other hand, the v0 jsonify approach does the following:
> (1) Serialize the in-memory unversioned state into json, by traversing state 
> and accumulating the overall serialized json.
> This means that v1 has substantial overhead vs v0, and we need to remove it 
> to bring v1 on-par or better than v0. v1 should serialize directly to json 
> (straightforward with jsonify) or protobuf (this can be done via a 
> io::CodedOutputStream).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to