Re: Performance of BeamFnData between Python and Java

Lukasz Cwik Wed, 07 Nov 2018 11:15:02 -0800

gRPC folks provide a bunch of benchmarks for different scenarios:
https://grpc.io/docs/guides/benchmarking.html
You would be most interested in the streaming throughput benchmarks since
the Data API is written on top of the gRPC streaming APIs.


200KB/s does seem pretty small. Have you captured any Python profiles[1]
and looked at them?

1:
https://lists.apache.org/thread.html/f8488faede96c65906216c6b4bc521385abeddc1578c99b85937d2f2@%3Cdev.beam.apache.org%3E


On Wed, Nov 7, 2018 at 10:18 AM Hai Lu <[email protected]> wrote:

> Hi,
>
> This is Hai from LinkedIn. I'm currently working on Portable API for Samza
> Runner. I was able to make Python work with Samza container reading from
> Kafka. However, I'm seeing severe performance issue with my set up,
> achieving only ~200KB throughput between the Samza runner in the Java side
> and the sdk_worker in the Python part.
>
> While I'm digging into this, I wonder whether some one has benchmarked the
> data channel between Java and Python and had some results how much
> throughput can be reached? Assuming single worker thread and single
> JobBundleFactory.
>
> I might be missing some very basic and naive gRPC setting which leads to
> this unsatisfactory results. So another question is whether are any good
> articles or documentations about gRPC tuning dedicated to IPC?
>
> Thanks,
> Hai
>
>
>

Re: Performance of BeamFnData between Python and Java

Reply via email to