[grpc-io] Re: A single synchronous call takes ~230ms on local host!?! What am I doing wrong?

Amogh Akshintala Wed, 05 Sep 2018 11:35:42 -0700

Hey Carl,

Thanks for your time!


I see the same performance numbers as reported in the protobuf performance 
dashboard (~0.2ms per rpc) if I set up the HelloWorld client to call SayHello 
1000 times in a tight loop and average it over the 1000 calls.

How I arrived at the numbers I reported (everything is measured on localhost):
The links are to my client and server code on GitHub.
Client RPC Stub ( 
https://github.com/aakshintala/darknet/blob/cf1c4dfeb2a2f1c3d123bc89f90edbb37854b25d/server/client.cpp#L100
 ) :
perform any necessary data manipulation
Pack Request message
*start = getTimeOfDay()*
stub->invokeRPC()
checkStatus()
*end = getTimeofDay()*
*Client RTT = end - start*

In Server RPC ServiceImpl ( 
https://github.com/aakshintala/darknet/blob/cf1c4dfeb2a2f1c3d123bc89f90edbb37854b25d/server/server.cpp#L43
 ) :
*start = getTimeofDay()*
serviceRequest() <- GPU time is calculated inside this function, but ignore 
that for now.
*end=getTimeofDay()*
return Status
*Server time = end - start*
*
*
*gRPC + protobuf overhead = Client RTT - Server Time*
*
*
I replaced protobuf with flatbuffers yesterday, after noticing (using perf)
that a significant chunk of processing time was spent in protobuf serialization
and deserialization code. 
Latency really improved with flatbuffer (no parsing, so…), 
but man is that library hard to use/debug compared to protobuf...

New numbers *with flatbuffers:*
*client RTT = ~70ms*
*Server Time = ~40ms*
*gRPC + flatbuffers = ~30ms (for ~4MiB of data over localhost)*
*
*
Thanks for the link to pprof. Will check it out, especially if 

Cheers,
Amogh Akshintala
http://aakshintala.com

On Wed, Sep 05, 2018 at 1:59 PM "'Carl Mastrangelo' via grpc.io" < ">"'Carl 
Mastrangelo' via grpc.io" > wrote:

> 
> Our own benchmarks (
> https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5636470266134528
> ) get about 1000x better latency than that, so something is definitely up. 
> Can you describe how you arrived at that line number, or the tools you
> used to profile?  (We use perf and pprof)
> 
> 
> On Monday, September 3, 2018 at 10:39:45 AM UTC-7, [email protected]
> wrote:
>> Hail gRPC experts (;D),
>> 
>> 
>> I'm trying to build a image/video object detection server (as one of the
>> reusable pieces in a benchmark suite) with low RTT requirements
>> (near-realtime say ~60-90ms RTT)...
>> I've used gRPC and protobuf (built from git master; hashes below in case
>> that is relevant) for the serialization and transport.
>> ______________________________ ___
>> 
>> grpc: 
>> commit dbc1e27e2e1a81b61eb064eb036ec6 a267f88cb6
>> Merge: 9bc6cd1 5d24ab9
>> Author: Jiangtao Li <email redacted by me>
>> Date:   Fri Jul 20 17:00:18 2018 -0700 
>> 
>> 
>> protobuf:
>> commit b5fbb742af122b565925987e65c089 57739976a7
>> Author: Bo Yang < email redacted by me >
>> Date:   Mon Mar 5 19:54:18 2018 -0800
>> ______________________________ ___
>> 
>> 
>> 
>> gRPC seems to add inane amounts of overhead -- ~160ms (~2x the server's
>> processing time)!
>> For now I'm running on a single machine (a pretty beefy machine, so
>> contention isn't an issue...) operating over localhost (loopback).
>> The amount of data being transferred is considerable, but not unheard off
>> (~4MiB per request).
>> 
>> 
>> Server-side timing measurements:
>> doDetection: new requeust 0x7ffc77f16920
>> 0x7ffc77f16920: GPU processing took 24.045 milliseconds
>> 0x7ffc77f16920: Server took *72.206 millisecond*
>> *
>> *
>> Client-side measurements:
>> 10 objects detected.
>> This request took *234.825 milliseconds *
>> 
>> 
>> *Client RTT - Server processing time = 234.85-72.206 = 162.644ms (!??!)*
>> I've pinned the server and client to separate cores using taskset.
>> There isn't anything else running on the server and it's a beefy 48 core
>> (Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz) machine with ample RAM
>> (128GiB), etc....
>> 
>> 
>> As a start, I instrumented the implementation of the synchronous call in
>> include/grpcpp/impl/ codegen/client_unary_call.h:
>> BlockingUnaryCallImpl( ChannelInterface* channel, const RpcMethod& method,
>> 
>>                          ClientContext* context, const InputMessage&
>> request,
>>                          OutputMessage* result)
>> 
>> 
>> and found that the vast majority of the time is spent spinning on a
>> completion queue:
>> line 107:   if (cq.Pluck(&ops)) {
>> 
>> 
>> 
>> I wonder if I need to configure gRPC differently (perhaps the default
>> configurations are more geared towards latency-insensitive batching?)...
>> 
>> 
>> Any help understanding these numbers would be appreciated.
>> 
>> Server code: https://github.com/ aakshintala/darknet/blob/ 
>> master/server/server.cpp
>> ( https://github.com/aakshintala/darknet/blob/master/server/server.cpp )
>> Client code: https://github.com/ aakshintala/darknet/blob/ 
>> master/server/client.cpp
>> ( https://github.com/aakshintala/darknet/blob/master/server/client.cpp )
>> Proto file: https://github.com/ aakshintala/darknet/blob/ 
>> master/server/darknetserver.
>> proto (
>> https://github.com/aakshintala/darknet/blob/master/server/darknetserver.proto
>> )
>> 
>> 
>> Thanks in advance,
>> Amogh Akshintala
>> aakshintala.com ( http://aakshintala.com )
>> 
>> 
>> 
> 
> 

--
You received this message because you are subscribed to a topic in the Google 
Groups "grpc.io" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/grpc-io/USjGJDmu_Hw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
[email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/c37ac6ed-9149-43dc-b9a3-5574e4eca439%40googlegroups.com
 ( 
https://groups.google.com/d/msgid/grpc-io/c37ac6ed-9149-43dc-b9a3-5574e4eca439%40googlegroups.com?utm_medium=email&utm_source=footer
 ).
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/5b901a3e11f12e0001093773%40polymail.io.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] Re: A single synchronous call takes ~230ms on local host!?! What am I doing wrong?

Reply via email to