gRPC folks provide a bunch of benchmarks for different scenarios: https://grpc.io/docs/guides/benchmarking.html You would be most interested in the streaming throughput benchmarks since the Data API is written on top of the gRPC streaming APIs.
200KB/s does seem pretty small. Have you captured any Python profiles[1] and looked at them? 1: https://lists.apache.org/thread.html/f8488faede96c65906216c6b4bc521385abeddc1578c99b85937d2f2@%3Cdev.beam.apache.org%3E On Wed, Nov 7, 2018 at 10:18 AM Hai Lu <[email protected]> wrote: > Hi, > > This is Hai from LinkedIn. I'm currently working on Portable API for Samza > Runner. I was able to make Python work with Samza container reading from > Kafka. However, I'm seeing severe performance issue with my set up, > achieving only ~200KB throughput between the Samza runner in the Java side > and the sdk_worker in the Python part. > > While I'm digging into this, I wonder whether some one has benchmarked the > data channel between Java and Python and had some results how much > throughput can be reached? Assuming single worker thread and single > JobBundleFactory. > > I might be missing some very basic and naive gRPC setting which leads to > this unsatisfactory results. So another question is whether are any good > articles or documentations about gRPC tuning dedicated to IPC? > > Thanks, > Hai > > >
