Tunneller commented on issue #42159:
URL: https://github.com/apache/arrow/issues/42159#issuecomment-2170523839
Yes, that does work, thanks.
If query is an SQL string and bq_client is a BigQuery client (that has had
credentials, etc, sorted) then:
job_query = bq_client.query(query)
q = job_query.result()
creates a temporary table on BigQuery which has got the results of the SQL
call. The table is in column structure, largely similar to an Arrow file, I
guess. The recommended way to get it off BigQuery is py = q.to_arrow().
Followed by py.to_pandas() to be available for local analysis. Because BQ is
column oriented, it internally takes advantage of encoding along the columns.
And now Arrow files have run-end encoding that supports to_pandas().
So what is missing from the chain is to pass encoded columns in BQ to
encoded columns in Arrow without inflation in between. Passing data from
BigQuery to Arrow is a bottleneck at the moment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]