Barring bugs I seem to be pushing about ~3.5 gigabytes/second on localhost with a single thread -- this is without touching any default gRPC settings so I don't know what tuning options are available:
``` $ PATH=release:$PATH release/flight-benchmark -num_threads 1 -num_streams 10 -records_per_batch 4096 Server running with pid 22098 Server listening on localhost:31337 Bytes read: 3200000000 Nanos: 869852908 Speed: 3508.36 MB/s ``` This is sending ~32 gigabytes from the perf server to the benchmark client in a little over 8 seconds. Take a look at the FlameGraph (I think this also captures the child benchmark server): https://www.dropbox.com/s/kkibfs9froh0mt3/flight-perf-20180916-1.svg?dl=0 It appears that gRPC's TCP / HTTP2 machinery dominates the runtime, which is what we want. * A tiny fraction (< 5%) is spent in "deserialization" / IPC reconstuction * About 10% of time is spend copying memory onto the outgoing gRPC buffer * gRPC reads account for ~37% of runtime * gRPC sends account for 25% of runtime All and all looks pretty good to me. [ Full content available at: https://github.com/apache/arrow/pull/2547 ] This message was relayed via gitbox.apache.org for [email protected]
