chunked output

via GitHub Wed, 05 Jul 2023 08:58:59 -0700


paleolimbot opened a new pull request, #870:
URL: https://github.com/apache/arrow-adbc/pull/870


   I'm not sure if this is exactly required and I'm happy to implement 
differently. One limitation of the existing driver is that (1) it will error if 
more than 2GB of total text exists in one column and (2) the array stream's 
get_next will block until the entire result has been computed. A cool thing you 
can do in R is do something like `read_adbc() |> 
arrow::as_record_batch_reader() |> arrow::write_dataset()` for 
bigger-than-memory queries...this behaviour is basically to support that.
   
   Some open questions:
   
   - Default chunk size? I chose 16 MB...maybe it should be bigger? Smaller? I 
like MB instead of number-of rows because it doesn't make assumptions about how 
big or small the rows are. When querying a PostGIS table, for example, polygon 
features can be several MB each.
   - How to configure the chunk size? Should there be a canonical statement 
option for this or should the postgres driver make up its own?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-adbc] paleolimbot opened a new pull request, #870: feat(c/driver/postgresql): Implement streaming/chunked output

Reply via email to