aboderinsamuel commented on PR #50203: URL: https://github.com/apache/arrow/pull/50203#issuecomment-4779363878
Thanks @tadeja @rok @AlenkaF, really helpful. I reproduced tadeja's cases and both are real: a wrong-shape array (e.g. (3,2) into a (2,3) tensor) is silently accepted, and permuted tensor types store the wrong layout (the to_numpy round-trip doesn't return the input). Root cause: array.pxi swaps the extension type for its storage_type before conversion and re-wraps with wrap_array after, so the C++ converter only ever sees the flat fixed_size_list, it can't know the tensor's shape or permutation. The flatten is correct for a plain fixed_size_list, but shape-validation and permutation-handling need to live in the Python/Cython layer where the FixedShapeTensorType is still intact. Plan: 1. C++: switch from PyArray_Ravel to the explicit PyArray_CheckFromAny + NPY_ARRAY_C_CONTIGUOUS approach, reading PyArray_DATA directly (per @AlenkaF), this should also let me handle the byte-order case in the same call. 2. Cython: validate each element's shape against the tensor's shape (so (3,2) into (2,3) errors), and reject permuted types with a clear error for now, with full permutation support as a follow-up, unless you'd prefer I handle the transpose here. 3. Also tighten the comment + error message and document that we always output C order, per @rok. Does this direction sound right before I push? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
