paleolimbot commented on issue #331:
URL: 
https://github.com/apache/arrow-nanoarrow/issues/331#issuecomment-2119499843

   It's a great point that, given an arbitrary `nanoarrow_array_stream`, 
there's no way to know how expensive it will be to consume it (or if it 
supports being consumed from another thread, or maybe other things).
   
   I am not sure this can be added to the `nanoarrow_array_stream` itself...the 
object itself is a sort of "safe home" for the underlying stream and there is 
quite a lot of nanoarrow/R code that moves the C structures from one home to 
another. Ensuring that the attribute stayed up-to-date would be tricky (but 
possible if this is important).
   
   Another thing that could be done is to add an argument to 
`as_nanoarrow_array_stream()` such that one could do 
`as_nanoarrow_array_stream(something, only_consume_if_this_will_be_fast = 
TRUE)` (obviously with a more compact name). I'm not sure exactly how that 
would be implemented everywhere, though...often the database or Acero or object 
that is being exported doesn't have a way to query this, either).
   
   ...or maybe other ideas?
   
   I think a user has some context when typing these things, though: if a user 
types `some_arrow_dplyr_query |> as_polars_df()`, I am not sure they will be 
surprised that it takes a while if they just typed a big query (you might be 
able to compensate for that in `as_polars_df()` by checking for user interrupts 
when consuming the stream).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to