paleolimbot commented on issue #331: URL: https://github.com/apache/arrow-nanoarrow/issues/331#issuecomment-2119499843
It's a great point that, given an arbitrary `nanoarrow_array_stream`, there's no way to know how expensive it will be to consume it (or if it supports being consumed from another thread, or maybe other things). I am not sure this can be added to the `nanoarrow_array_stream` itself...the object itself is a sort of "safe home" for the underlying stream and there is quite a lot of nanoarrow/R code that moves the C structures from one home to another. Ensuring that the attribute stayed up-to-date would be tricky (but possible if this is important). Another thing that could be done is to add an argument to `as_nanoarrow_array_stream()` such that one could do `as_nanoarrow_array_stream(something, only_consume_if_this_will_be_fast = TRUE)` (obviously with a more compact name). I'm not sure exactly how that would be implemented everywhere, though...often the database or Acero or object that is being exported doesn't have a way to query this, either). ...or maybe other ideas? I think a user has some context when typing these things, though: if a user types `some_arrow_dplyr_query |> as_polars_df()`, I am not sure they will be surprised that it takes a while if they just typed a big query (you might be able to compensate for that in `as_polars_df()` by checking for user interrupts when consuming the stream). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
