[GitHub] [arrow] westonpace commented on issue #34354: `to_numpy().tolist()` is significantlly faster than `.tolist()`

via GitHub Mon, 27 Feb 2023 14:37:28 -0800


westonpace commented on issue #34354:
URL: https://github.com/apache/arrow/issues/34354#issuecomment-1447220076


   I'm not aware that anyone has tried particularly hard to optimize 
`to_pylist`.  I think the expectation at the moment is that it won't be used 
all that often on large lists since a python list is a very inefficient way to 
represent the data.
   
   However, from a glance, my guess would be that the difference is that Arrow 
implements `to_pylist` mostly in python:
   
   ```
       def to_pylist(self):
           """
           Convert to a list of native Python objects.
   
           Returns
           -------
           lst : list
           """
           return [x.as_py() for x in self]
   ```
   
   However, in numpy the entire `tolist` function is in C.  So in Arrow you get 
500k python calls and in numpy you get one.  It should be fairly 
straightforward to implement the more efficient version in Arrow.  I would hope 
it could mostly be done in cython.  If someone is interested in taking this on 
I can try giving a few pointers / suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #34354: `to_numpy().tolist()` is significantlly faster than `.tolist()`

Reply via email to