jorisvandenbossche commented on PR #34883: URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503146111
You don't necessarily "loose" that information, you still have it available in the extension type. It won't be embedded in the numpy array, but neither do the dimension names (of course, I know this is a bit different, as names can never be embedded in a numpy array). So in general you might still need to process the extension type attributes to ensure you correctly handle the resulting numpy array. For example, assume you have channels-last array (logically NCHW, physically NHWC). This information could also be stored as `"dim_names": ["H", "W", "C"]`, or only as permutation ([2, 0, 1]), or both. So if you convert such a array to numpy, you still need to check for those aspects as well, and also need to take into account that if you already got a permutated array by default from `to_numpy_ndarray`, the dim_names were not permuted. (this is different compared to a Tensor, where the permuted dim names can be attached to the Tensor object) Just to mention that there are various related aspects that can make this more complex. And so I still think it is a _valid_ option to say for `to_numpy_ndarray` that it will just simply ignore all optional metadata (and thus always return the physical array), and it is up to the user to decide what to do with the optional metadata (permutation and/or dim_names). Now, as mentioned above, I don't have a that strong opinion about it, so I am also fine if in the end `to_numpy_ndarray` permutes by default (in that case I think I would prefer adding an keyword to turn it off, since it's not that straightforward to convert the permuted array to its physical form). On the short term, erroring for those ambiguous cases is also fine, that leaves all options open for finetuning this in a follow-up PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
