[I] [Python] Implement a repr for Array and RecordBatch/Table for non-CPU data [arrow]

via GitHub Wed, 15 May 2024 03:05:06 -0700


jorisvandenbossche opened a new issue, #41664:
URL: https://github.com/apache/arrow/issues/41664


   Currently, if you have a pyarrow `Array` or `RecordBatch`/`Table` object 
that is backed by non-CPU data, just displaying the object (`__repr__`) 
crashes, because our `PrettyPrint` functionality assumes it deals with data on 
the CPU.
   
   At a minimum, we should make the repr _not_ crash, for example by first 
checking whether we have CPU data, and if not only printing generic information 
(the array type or the schema) and not a preview of the data.  
   
   But, I think we could also do better by actually ensuring the repr works and 
is informative for non-CPU data as well. For the pretty printing part of the 
repr, we only need a small subset of the data (by default first and last 5 
elements), and copying such portion to the CPU just for printing should 
generally be fine.
   
   If we implement this on the Python side, this depends on exposing the 
generic CopyTo functionality (https://github.com/apache/arrow/issues/41126) to 
copy to CPU device. However, we could maybe also implement this on the C++ side 
in `PrettyPrint` itself?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Python] Implement a repr for Array and RecordBatch/Table for non-CPU data [arrow]

Reply via email to