Re: [I] [Python][FlightRPC] Is it possible to use pyarrow.dataset as an abstraction over arrow-flight data? [arrow]

via GitHub Wed, 06 Dec 2023 14:38:23 -0800


robtandy commented on issue #37278:
URL: https://github.com/apache/arrow/issues/37278#issuecomment-1843801054


   @lidavidm Thanks for responding, I think i have the same need as @balshetzer 
and am not sure if I understand the tooling perfectly. 
   
   I would like to be able to use DuckDB to query across older data (held in 
parquet files) as well as newer data, held in apache arrow Tables hosted on a 
fleet of Arrow Flight serving machines.
   
   My understanding of this, from 
https://duckdb.org/docs/guides/python/sql_on_arrow, and from experimentation is 
that, i need to construct a Dataset that would have Fragments representing 
parquet files, but also representing RecordBatchReaders returned from Arrow 
Flight.
   
   AFAICT this isn't possible at the moment, at least with pyarrow.  Datasets 
do not allow creation from a record batch reader.
   
   If i understand correctly, FlightSQL and ADBC facilitate communication 
between the database and clients.  The dataset helps create an abstraction over 
the source data for the database itself.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python][FlightRPC] Is it possible to use pyarrow.dataset as an abstraction over arrow-flight data? [arrow]

Reply via email to