[GitHub] [arrow] jorisvandenbossche commented on pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

via GitHub Fri, 17 Feb 2023 00:29:48 -0800


jorisvandenbossche commented on PR #33925:
URL: https://github.com/apache/arrow/pull/33925#issuecomment-1434293575


   > In [RAPIDS libcudf](https://github.com/rapidsai/cudf), we would use an 
nested List type to represent the proposed Tensor type. In the case where 
`permutation` is not provided, I believe we could use the elements zero-copy 
and then create offsets based on the `shape` parameter.
   
   @GregoryKimball it seems that cudf only has a variable size List type 
(https://docs.rapids.ai/api/cudf/stable/user_guide/cudf.listdtype)? In the 
proposal here we use a fixed size list type, which doesn't have offsets. Of 
course you can use a variable size List type as well, but because each tensor 
element has the same shape (for this extension type, there is discussion to add 
a second type with variable shape), the offsets will increase regularly and are 
somewhat superfluous.
   But the main buffer with all values can still be used zero-copy (which is 
most important here).
   
   > would correspond to these List column children:
   
   It's up to cudf to decide which data type to use exactly, but I am not fully 
sure if it is needed to use a _nested_ list type. For your example where you 
are using `LIST<LIST<INT32>>`, you could also use `LIST<INT32>` where each 
element of that list array is one flat tensor element (the `elements` array is 
still the same, you just have less offsets). But in the end it will depend on 
what you exactly want to do with this data which representation will be most 
useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

Reply via email to