prmoore77 opened a new issue, #968:
URL: https://github.com/apache/arrow-adbc/issues/968
Please provide a way to access the batch reader from an ADBC DBAPI cursor
that doesn't require using an underscore method/attribute.
Giving more direct access to the batch reader will allow folks to write out
batches of records fetched from Flight SQL (or other ADBC sources), requiring
less memory - as they don't have to first fetch the entire result set into
memory.
Currently - in order to access the batch reader from a Flight SQL server
using the Python ADBC Flight SQL driver (and DBAPI) - one has to use an
underscore method (I believe) - as demonstrated by this Python code:
```
import os
import adbc_driver_flightsql.dbapi as flight_sql
import pyarrow.parquet as pq
def main():
with flight_sql.connect(uri=f"grpc+tls://localhost:31337",
db_kwargs={"username": "flight_username",
"password":
os.environ["FLIGHT_PASSWORD"],
"adbc.flight.sql.client_option.tls_skip_verify": "true"
}
) as conn:
with conn.cursor() as cur:
cur.execute(operation="SELECT * FROM orders")
reader = cur._results._reader # We have to use an underscore
attribute here...
writer = pq.ParquetWriter(where="orders.parquet",
schema=reader.schema)
total_rows: int = 0
for batch in reader:
writer.write_batch(batch=batch)
total_rows += batch.num_rows
print(f"Wrote batch of {batch.num_rows:,d} row(s) - total
row(s) written thus far: {total_rows:,d}")
print(f"Total number of rows written: {total_rows:,d}")
if __name__ == "__main__":
main()
```
Please add a method called something like: `fetch_arrow_reader` to:
`dbapi.Cursor` to allow a more direct way to get the batch reader.
Thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]