jorisvandenbossche commented on PR #33925: URL: https://github.com/apache/arrow/pull/33925#issuecomment-1434293575
> In [RAPIDS libcudf](https://github.com/rapidsai/cudf), we would use an nested List type to represent the proposed Tensor type. In the case where `permutation` is not provided, I believe we could use the elements zero-copy and then create offsets based on the `shape` parameter. @GregoryKimball it seems that cudf only has a variable size List type (https://docs.rapids.ai/api/cudf/stable/user_guide/cudf.listdtype)? In the proposal here we use a fixed size list type, which doesn't have offsets. Of course you can use a variable size List type as well, but because each tensor element has the same shape (for this extension type, there is discussion to add a second type with variable shape), the offsets will increase regularly and are somewhat superfluous. But the main buffer with all values can still be used zero-copy (which is most important here). > would correspond to these List column children: It's up to cudf to decide which data type to use exactly, but I am not fully sure if it is needed to use a _nested_ list type. For your example where you are using `LIST<LIST<INT32>>`, you could also use `LIST<INT32>` where each element of that list array is one flat tensor element (the `elements` array is still the same, you just have less offsets). But in the end it will depend on what you exactly want to do with this data which representation will be most useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
