Le 29/07/2019 à 15:13, David Li a écrit :
> Ah, sorry, I was unclear - the performance issue is not with Flight at
> all, but with putting Arrow over gRPC naively.
> 
> At some point, we benchmarked gRPC-Python carrying Arrow data, and
> found that it only achieved ~half the throughput of Flight-Python. So
> implementing BigQuery-Flight would also avoid that performance
> pitfall, assuming the client library for BigQuery-Arrow uses
> gRPC-Python.
> 
> The reason we found is that since gRPC technically does not require
> Protobuf, it copies message payloads into a CPython bytestring, and
> then the Python code then turns around and hands that to Protobuf,
> which then copies data into its data structures and gives it back to
> Python

gRPC shouldn't need to copy the payload into a CPython bytestring.
Instead, it could instantiate a buffer-like Python object pointing to
the original data.  This is "easily" done in Cython, and gRPC-python
already uses Cython:
https://cython.readthedocs.io/en/latest/src/userguide/buffer.html
https://docs.python.org/3/c-api/buffer.html

Regards

Antoine.

Reply via email to