NickCrews opened a new pull request, #48581:
URL: https://github.com/apache/arrow/pull/48581

   ### Rationale for this change
   
   Fixes https://github.com/apache/arrow/issues/22081
   
   Efficiently/correctly creating List arrays from numpy arrays with ndims > 1.
   
   ### What changes are included in this PR?
   
   Before, `pa.array(np.arange(6).reshape(2,3))` would fail. Now it returns an 
array of length 2, where each element is a size-3 list.
   
   I think this is the only intuitive behavior. But if you can think of an 
alternative behavior a user might want/expect from this, then please let's talk 
about it.
   
   I am not super familiar with numpy/pyarrow memory layout internals to 
understand if there are other cases besides the C-continuous memory layout 
where we could use zero-copy. But even if there are other cases, I'm not sure 
if we need to bother with them, I bet the c-continuous covers 95% of usage.
   
   I also am not sure if this is a good way to to do this, or if there is a 
more succinct way.
   
   This was written entirely by GH copilot. You can see my dialog with copilot 
as I tweaked it's directions and chose an implementation in 
https://github.com/NickCrews/arrow/pull/3
   
   Perhaps this logic should be pulled into its own `_from_n_dim_numpy(np_arr)` 
helper function to keep the larger control flow of the function more clear, let 
me know if you think so.
   
   ### Are these changes tested?
   
   Yes, I think adequately. It doesn't actually verify that the zero-copy path 
is used, just that the results are correct. I didn't really want to deal with 
messing with monkeypatching/spying on things to detect the 0-copy, but can add 
this if we want to verify.
   
   We also just compare the results to the result via the .tolist() path, but 
perhaps we should instead write out the actual expected value as boilerplate so 
that it is even more obvious what the expected behavior is.
   
   ### Are there any user-facing changes?
   
   No breaking changes, only newly supported features!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to