robtandy commented on issue #37278: URL: https://github.com/apache/arrow/issues/37278#issuecomment-1843801054
@lidavidm Thanks for responding, I think i have the same need as @balshetzer and am not sure if I understand the tooling perfectly. I would like to be able to use DuckDB to query across older data (held in parquet files) as well as newer data, held in apache arrow Tables hosted on a fleet of Arrow Flight serving machines. My understanding of this, from https://duckdb.org/docs/guides/python/sql_on_arrow, and from experimentation is that, i need to construct a Dataset that would have Fragments representing parquet files, but also representing RecordBatchReaders returned from Arrow Flight. AFAICT this isn't possible at the moment, at least with pyarrow. Datasets do not allow creation from a record batch reader. If i understand correctly, FlightSQL and ADBC facilitate communication between the database and clients. The dataset helps create an abstraction over the source data for the database itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
