paleolimbot commented on issue #822: URL: https://github.com/apache/arrow-nanoarrow/issues/822#issuecomment-3493017975
Thank you for investigating! The conversion definitely needs some work...I started a PR some time ago to improve it but didn't have a chance to finish. Perhaps now with LLMs we can do much better! The non-linearness is concerning...I wonder if that is related to the cost of growing the string pool or whether there's some conversion that is happening more that it should. > this following path, which also tries to go via arrow::Table, is just as slow as the direct nanoarrow->data.frame path. not sure if this give any ideas to root cause of problem? I have had problems in the past benchmarking this kind of thing because putting that many R objects into the session makes it very hard for the garbage collector to keep up (particularly when benchmarking, since this puts intense pressure by creating and deleting objects rapidly). In general I would expect the materialization of large amounts of strings to perform the same with both nanoarrow and arrow because there's really just no avoiding all of the calls to the R API + the global string pool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
