David Lee created ARROW-16629:
---------------------------------
Summary: Apache Arrow Flight transport speed improvement for list
structures
Key: ARROW-16629
URL: https://issues.apache.org/jira/browse/ARROW-16629
Project: Apache Arrow
Issue Type: Improvement
Components: FlightRPC
Affects Versions: 8.0.0
Reporter: David Lee
I just started testing using Arrow Flight to send results from a GraphQL server
with FlightServer() running on i.
GraphQL defines a schema for your data output which can be mapped to an Arrow
schema so I thought it would make sense to try using Arrow Flight to transport
results instead of using REST style JSON records.
Arrow Flight was 66% faster in all case, but it didn't scale as the number of
child records increased. I suspect that serializing structs or lists needs some
improvement..
Here is the discussion I opened including links to test scripts.
[https://github.com/mirumee/ariadne/discussions/867]
10 records it was 0.049 seconds faster or 80% faster
10000 records it was 0.109 seconds faster or 66% faster
10 million records it was 54 seconds faster or 66% faster.
Also here is the data structure that is sent across the wire..
pyarrow.Table
data: struct<test_lists: struct<float_list: list<item: double>, int_list:
list<item: int64>, length: int64, string_list: list<item: string>, time_spent:
double>>
child 0, test_lists: struct<float_list: list<item: double>, int_list:
list<item: int64>, length: int64, string_list: list<item: string>, time_spent:
double>
child 0, float_list: list<item: double>
child 0, item: double
child 1, int_list: list<item: int64>
child 0, item: int64
child 2, length: int64
child 3, string_list: list<item: string>
child 0, item: string
child 4, time_spent: double
data: [
-- is_valid: all not null
-- child 0 type: struct<float_list: list<item: double>, int_list: list<item:
int64>, length: int64, string_list: list<item: string>, time_spent: double>
-- is_valid: all not null
-- child 0 type: list<item: double>
[[13.500371672273381,17.747395152140353,28.973205439157457,1.361443415643098,19.029191125636135,14.62284718057391,18.44333922481529,7.906278860251386,14.402464768126993,5.826040531772251]]
-- child 1 type: list<item: int64>
[[23,3,21,15,20,4,10,16,23,25]]
-- child 2 type: int64
[10]
-- child 3 type: list<item: string>
[["qypsupwtxy","vrxptpspyt","qpvruwsuqq","ywwpyxrvrt","wswutpxxqv","tsyypstxvv","ytprpqsxsx","wtwsxvprvu","suwtrvqvwp","wtsrwywwty"]]
-- child 4 type: double
[0]]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)