We are planning to write our own Message bus for a low latency Ad platform. With very tight SLAs on response times (<80ms), we would want to have a solution which can give us a good throughput with most of the traffic within this latency.
What complicates the problem is that there are 4 hops involved before a response is sent out. This means, we either write our own RPC implementation in C(*shudders*) or use a substitue which can get us towards it. Since I am responsible for this, I would like to evaluate gRPC is right fit for such a peculiar use case? Thanks, On Tue, Aug 9, 2016 at 6:12 PM, Carl Mastrangelo <[email protected]> wrote: > The latency numbers are a little tricky to interpret with respect to > throughput. Latency and throughput are at odds with each other, and > optimizing one usually comes at the cost of the other. (And, generally > speaking, latency is more important than throughput). > > When Testing for latency, I create a client and server running on separate > machines. The client sends a single messages, and waits for a response. > Upon receiving a response it sends another. We call this a closed-loop > benchmark. It is effectively single threaded, in order to not introduce > additional noise into the system. (We also vary whether or not to use an > additional executor when handling responses, which can change the latency > by about 25us) In such a setup, I can do around 200us latency, which ends > up being around 5000qps for a single core. > > When running the benchmark trying to max out QPS, I can get much higher > throughput. The latency in such tests is around 50ms median latency, for > an aggregate throughput of about 300-400 Kqps. (and 186ms at the 99.9th > percentile). This is running a java server and client each with a 32 core > machine. We can go much higher, and I have added a number of performance > issues to the grpc-java github project. (all the code is available in our > benchmarks directory, so that you can reproduce them yourself) > > We give you good defaults out of the box with gRPC. The numbers I am > mentioning here are achieved by looking more thoroughly into the setup, and > making the appropriate changes. Our setup is careful to avoid lock > contention, avoid thread context switches, avoid allocating memory where > possible, and obeying the flow control signals. We prefer the Async API to > the synchronous one. > > All our numbers are visible on the dashboard as previously mentioned. > Describing your use case will tell what approximate performance you can > expect, and how to achieve it. > > On Tue, Aug 9, 2016 at 5:25 PM, Pradeep Singh <[email protected]> wrote: > >> Thanks Carl. >> >> And what throughput can you achieve with these latencies? >> I mean sending one Req and receiving one Response is fine but what >> happens to latencies when REQ rate reaches 50K Reqs per second, especially >> what is avg latency and throughput at point when CPU cores are saturated at >> either Client or Server. >> >> I agree that latency and throughput do not go hand in hand but would love >> to know your numbers before it starts crossing millisecond latency >> boundaries? >> >> --Pradeep >> >> On Tue, Aug 9, 2016 at 5:00 PM, Carl Mastrangelo <[email protected]> >> wrote: >> >>> On machines that are within the same network, you can expect latencies >>> in the low hundreds of microseconds. I have personally measured numbers >>> within 100 - 200 microseconds on nearby machines. I had to tune the server >>> somewhat to achieve this, but it is possible. >>> >>> On Tuesday, August 9, 2016 at 10:33:31 AM UTC-7, Pradeep Singh wrote: >>>> >>>> Oh I was running the included benchmark in gRPC src code. >>>> I think it reuses the same connection. >>>> >>>> 300us sounds really good. >>>> >>>> What latency do you guys notice when client and server are running on >>>> different hosts? >>>> >>>> Thanks, >>>> >>>> On Tue, Aug 9, 2016 at 8:58 AM, Eric Anderson <[email protected]> wrote: >>>> >>>>> On Mon, Aug 8, 2016 at 12:35 AM, <[email protected]> wrote: >>>>> >>>>>> With custom zmq messaging bus we get latency in order of microseconds >>>>>> between 2 services on same host (21 us avg) vs 2 ms avg for gRPC. >>>>>> >>>>> >>>>> Did you reuse the ClientConn between RPCs? >>>>> >>>>> In our performance tests on GCE (using not very special machines, >>>>> where netperf takes ~100µs) we see ~300µs latency for unary and ~225µs >>>>> latency for streaming in Go. >>>>> >>>> >>>> >>>> >>>> -- >>>> Pradeep Singh >>>> >>> >> >> >> -- >> Pradeep Singh >> > > -- Pradeep Singh -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAPpR%3DvWBqqfZYm0dhmAhP9iJUvNZYShpDi6muApc%3DYC1rMJ2%3DQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
