Ah, I thought you were trying to measure latency of a single RPC.    We 
have 2 QPS benchmarks, an open loop and a closed loop benchmark.  For the 
closed loop, it runs the single-rpc latency benchmark in parallel with 200 
copies.   This means there are only ever 200 active RPCs at a time.    The 
latecny is recorded, but not published anywhere.

>From your description, the open-loop benchmark sounds more like what you 
are doing.   We have a client that has a target QPS, and uses an 
exponentially distributed delay between starting RPCs.  This simulates real 
traffic better and has occasional bursts of RPCs.    We use this to measure 
CPU while holding the QPS constant.


Larger payloads making them system faster is odd, and may be explained by 
your benchmark machine.    For example, if there is no work for gRPC to do, 
it will go to sleep.   When the amount of work is too low, it spends a lot 
of time waking up and going back to sleep, lowering the overall 
performance.   Strangely, by adding more work (with bigger payloads), the 
system never goes to sleep and thus accomplishes more real work.    We work 
around this by trying to keep the machine as close to 100% CPU as possible 
without going over.   Additionally, we disable CPU frequency scaling to 
ensure stable results.  (The CPU down-clocks while waiting for network 
traffic, and doesn't speed back up fast enough when there is data).


We benchmark almost exclusively on Linux.




On Monday, February 5, 2018 at 4:32:55 PM UTC-8, [email protected] wrote:
>
> We actually have 8 threads sending bursts of requests simultaneously and 
> measuring each request individually. We are using bursts of request and 
> then waiting for some time to avoid hammering the server with huge amount 
> of requests. It seems you are describing that it is only one client that 
> sends one request only and then waits till the response to send another 
> request. We are not doing that, we are simulating some kind of QPS 
> approximation and measuring the latency.
>
> The behavior I'm seeing is that smaller payloads are slower than the 
> bigger payloads. I was thinking it maybe had to do with some buffer taking 
> longer to be filled and sent over the wire.
>
> The results you mention are they running on the Windows stack? 
>
> Thanks
>
> Eduardo
>
> On Monday, February 5, 2018 at 4:24:15 PM UTC-8, Carl Mastrangelo wrote:
>>
>> By closed loop i mean starting a new RPC upon completion of one.  I think 
>> that is the same as your option b).  These should be always faster with 
>> small payloads than larger payloads, which it seems like you are saying is 
>> happening?   
>>
>>
>> We have closed loop latency tests that use a 1 byte payload, and measure 
>> the 50th and 99th percentiles.   We see about 100us per RPC at 50th.
>>
>>  
>>
>> On Monday, February 5, 2018 at 4:16:29 PM UTC-8, [email protected] 
>> wrote:
>>>
>>> With closed loop do you mean 
>>>
>>> a) using loopback?
>>> b) measuring from when the request is made and finish measuring when the 
>>> response gets back?
>>>
>>> In the test we have, we are not using loopback (two vms over the 
>>> network) and we start measuring right before calling into 
>>> ClientAsyncResponseReader and calling into Finish and we stop measuring 
>>> when we get back the response and our callback gets called.
>>>
>>> If closed loop means something else please explain further.
>>>
>>> I may be able to share the code but before I go through that process do 
>>> you have any general suggestions that I can try or consider?
>>>
>>> Thanks
>>>
>>> Eduardo
>>>
>>>
>>> On Monday, February 5, 2018 at 3:43:34 PM UTC-8, Carl Mastrangelo wrote:
>>>>
>>>> Are you doing a closed loop latency test like gRPC benchmarking does?  
>>>>  Also, can you show your code?
>>>>
>>>> On Monday, February 5, 2018 at 3:10:03 PM UTC-8, [email protected] 
>>>> wrote:
>>>>>
>>>>> Hi, I'm working on a custom latency test. I'm using payloads of sizes 
>>>>> 1 byte, 200 bytes, 1kb and 10kb. The tests of 1 byte show a very big 
>>>>> difference from the rest of the payloads. (longer/worse latency).
>>>>>
>>>>> I'm working on grpc for c++ on Windows. I'm guessing this has to do 
>>>>> with some http2 packing or optimization logic meaning that it is taking 
>>>>> longer for the packets to be sent until a buffer is filled.
>>>>>
>>>>> What are the configuration I should look on modifying to see if I can 
>>>>> improve this behavior?
>>>>>
>>>>> I've tried looking around in 
>>>>>
>>>>> https://github.com/grpc/grpc/blob/master/include/grpc/grpc.h
>>>>>
>>>>> and in
>>>>>
>>>>>
>>>>> https://github.com/grpc/grpc/blob/master/include/grpc/impl/codegen/grpc_types.h
>>>>>
>>>>> with no luck. What do you suggest?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Eduardo
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/6c6cd447-1454-41d9-bedc-2f0dad2483b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to