Barring bugs I seem to be pushing about ~3.5 gigabytes/second on localhost with 
a single thread -- this is without touching any default gRPC settings so I 
don't know what tuning options are available:

```
$ PATH=release:$PATH release/flight-benchmark -num_threads 1 -num_streams 10 
-records_per_batch 4096
Server running with pid 22098
Server listening on localhost:31337
Bytes read: 3200000000
Nanos: 869852908
Speed: 3508.36 MB/s
```

This is sending ~32 gigabytes from the perf server to the benchmark client in a 
little over 8 seconds. 

Take a look at the FlameGraph (I think this also captures the child benchmark 
server):

https://www.dropbox.com/s/kkibfs9froh0mt3/flight-perf-20180916-1.svg?dl=0

It appears that gRPC's TCP / HTTP2 machinery dominates the runtime, which is 
what we want. 

* A tiny fraction (< 5%) is spent in "deserialization" / IPC reconstuction
* About 10% of time is spend copying memory onto the outgoing gRPC buffer
* gRPC reads account for ~37% of runtime
* gRPC sends account for 25% of runtime

All and all looks pretty good to me.

[ Full content available at: https://github.com/apache/arrow/pull/2547 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to