Re: [I] Run End Encoding not working with 16.1.0 [arrow]

via GitHub Sat, 15 Jun 2024 11:54:47 -0700


Tunneller commented on issue #42159:
URL: https://github.com/apache/arrow/issues/42159#issuecomment-2170523839


   Yes, that does work, thanks. 
   
   If query is an SQL string and bq_client is a BigQuery client (that has had 
credentials, etc, sorted) then:
   
       job_query = bq_client.query(query)
       q = job_query.result() 
   
   creates a temporary table on BigQuery which has got the results of the SQL 
call. The table is in column structure, largely similar to an Arrow file, I 
guess. The recommended way to get it off BigQuery is  py = q.to_arrow(). 
Followed by py.to_pandas() to be available for local analysis. Because BQ is 
column oriented, it internally takes advantage of encoding along the columns. 
And now Arrow files have run-end encoding that supports to_pandas().
   
   So what is missing from the chain is to pass encoded columns in BQ to 
encoded columns in Arrow without inflation in between. Passing data from 
BigQuery to Arrow is a bottleneck at the moment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Run End Encoding not working with 16.1.0 [arrow]

Reply via email to