Hail gRPC experts (;D),
I'm trying to build a image/video object detection server (as one of the
reusable pieces in a benchmark suite) with low RTT requirements
(near-realtime say ~60-90ms RTT)...
I've used gRPC and protobuf (built from git master; hashes below in case
that is relevant) for the serialization and transport.
_________________________________
grpc:
commit dbc1e27e2e1a81b61eb064eb036ec6a267f88cb6
Merge: 9bc6cd1 5d24ab9
Author: Jiangtao Li <email redacted by me>
Date: Fri Jul 20 17:00:18 2018 -0700
protobuf:
commit b5fbb742af122b565925987e65c08957739976a7
Author: Bo Yang <email redacted by me>
Date: Mon Mar 5 19:54:18 2018 -0800
_________________________________
gRPC seems to add inane amounts of overhead -- ~160ms (~2x the server's
processing time)!
For now I'm running on a single machine (a pretty beefy machine, so
contention isn't an issue...) operating over localhost (loopback).
The amount of data being transferred is considerable, but not unheard off
(~4MiB per request).
Server-side timing measurements:
doDetection: new requeust 0x7ffc77f16920
0x7ffc77f16920: GPU processing took 24.045 milliseconds
0x7ffc77f16920: Server took *72.206 millisecond*
Client-side measurements:
10 objects detected.
This request took *234.825 milliseconds *
*Client RTT - Server processing time = 234.85-72.206 = 162.644ms (!??!)*
I've pinned the server and client to separate cores using taskset.
There isn't anything else running on the server and it's a beefy 48 core
(Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz) machine with ample RAM
(128GiB), etc....
As a start, I instrumented the implementation of the synchronous call
in include/grpcpp/impl/codegen/client_unary_call.h:
BlockingUnaryCallImpl(ChannelInterface* channel, const RpcMethod& method,
ClientContext* context, const InputMessage&
request,
OutputMessage* result)
and found that the vast majority of the time is spent spinning on a
completion queue:
line 107: if (cq.Pluck(&ops)) {
I wonder if I need to configure gRPC differently (perhaps the default
configurations are more geared towards latency-insensitive batching?)...
Any help understanding these numbers would be appreciated.
Server code:
https://github.com/aakshintala/darknet/blob/master/server/server.cpp
Client code:
https://github.com/aakshintala/darknet/blob/master/server/client.cpp
Proto file:
https://github.com/aakshintala/darknet/blob/master/server/darknetserver.proto
Thanks in advance,
Amogh Akshintala
aakshintala.com
--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit
https://groups.google.com/d/msgid/grpc-io/cbdc3991-5d47-4bfb-a67c-340a65e2c390%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.