amoeba commented on issue #44069:
URL: https://github.com/apache/arrow/issues/44069#issuecomment-2346917084
Hello @abduazizR, thanks for filing an issue. Some implementation details
are leaking here and I can see how this is a bit confusing. Your `to_arrow()`
call is creating what we call a RecordBatchReader which can only be consumed
once and your call to `collect()` consumes it.
You have two workarounds here,
```r
# 1. Convert to an arrow Table first
x <- iris |>
to_duckdb() |>
to_arrow() |>
as_arrow_table()
x |> collect() # can be called repeatedly
# 2. or call to_arrow() every time you need to collect
x <- iris |> to_duckdb()
x |>
to_arrow() |>
collect() # can be called repeatedly
```
Converting to a Table (option 1) comes with the downside that it
materializes the entire Table in memory but this might work fine for your use
case.
In theory we could probably do some trick to make your original code work
like resetting the RecordBatchReader on repeated calls to collect/compute but,
at the very least, documenting this in `to_arrow()` would be good. I can file
an PR for the latter. @nealrichardson do you have an opinion on leaving things
as-is or implementing a fix here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]