Ah, sorry, I was unclear - the performance issue is not with Flight at
all, but with putting Arrow over gRPC naively.

At some point, we benchmarked gRPC-Python carrying Arrow data, and
found that it only achieved ~half the throughput of Flight-Python. So
implementing BigQuery-Flight would also avoid that performance
pitfall, assuming the client library for BigQuery-Arrow uses
gRPC-Python.

The reason we found is that since gRPC technically does not require
Protobuf, it copies message payloads into a CPython bytestring, and
then the Python code then turns around and hands that to Protobuf,
which then copies data into its data structures and gives it back to
Python. If we implemented a BigQuery Flight backend in C++ and wrote
Python bindings, we could avoid all that.

Best,
David

On 7/29/19, Antoine Pitrou <solip...@pitrou.net> wrote:
>
> Hi David,
>
> On Mon, 29 Jul 2019 09:06:52 -0400
> David Li <li.david...@gmail.com> wrote:
>>
>> If the current gRPC stub definitions are reasonably stable (in your
>> opinion), I might try implementing support. That might get reasonable
>> performance still, especially in Python (where I've found that a lot
>> of performance is lost copying messages into/out of CPython to work
>> with Protobuf & gRPC
>
> Can you elaborate on this performance issue?  Is it with our Flight
> Python bindings?
>
> Regards
>
> Antoine.
>
>
>

Reply via email to