Re: [I] slow nanoarrow_array_stream to data.frame for strings; slow ALTREP materialization [arrow-nanoarrow]

via GitHub Wed, 05 Nov 2025 11:43:00 -0800


paleolimbot commented on issue #822:
URL: 
https://github.com/apache/arrow-nanoarrow/issues/822#issuecomment-3493017975


   Thank you for investigating! The conversion definitely needs some work...I 
started a PR some time ago to improve it but didn't have a chance to finish. 
Perhaps now with LLMs we can do much better!
   
   The non-linearness is concerning...I wonder if that is related to the cost 
of growing the string pool or whether there's some conversion that is happening 
more that it should.
   
   > this following path, which also tries to go via arrow::Table, is just as 
slow as the direct nanoarrow->data.frame path. not sure if this give any ideas 
to root cause of problem?
   
   I have had problems in the past benchmarking this kind of thing because 
putting that many R objects into the session makes it very hard for the garbage 
collector to keep up (particularly when benchmarking, since this puts intense 
pressure by creating and deleting objects rapidly).
   
   In general I would expect the materialization of large amounts of strings to 
perform the same with both nanoarrow and arrow because there's really just no 
avoiding all of the calls to the R API + the global string pool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] slow nanoarrow_array_stream to data.frame for strings; slow ALTREP materialization [arrow-nanoarrow]

Reply via email to