rok commented on issue #43614: URL: https://github.com/apache/arrow/issues/43614#issuecomment-2276005166
> > Yes, but we can't really express string tensor with `FixedShapeTensorArray`. Perhaps we could with `VariableShapeTensorArray` where the last dimension is length of the string? :) > > What do you mean with "express"? The "fixed shape" vs "variable shape" in the type name is about the _number_ of values in the tensor elements, and not about whether those individual values are stored in a fixed width storage layout or not. (of course, for conversion to tensor libraries like numpy or pytorch, this matters, but not for Arrow itself) Ah, sorry, for some reason I thought `FixedSizeList` was also fixed-width, which it is not if we're storing strings. Then `FixedShapeTensorArray` should be just fine for this case indeed. What would then be the best way to enable `to_numpy_ndarray`?: 1. variable bit-width `Tensor` to use the current codepath 2. add new c++ method for this exact case 3. reuse the python code path from 15.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
