jorisvandenbossche opened a new issue, #41664: URL: https://github.com/apache/arrow/issues/41664
Currently, if you have a pyarrow `Array` or `RecordBatch`/`Table` object that is backed by non-CPU data, just displaying the object (`__repr__`) crashes, because our `PrettyPrint` functionality assumes it deals with data on the CPU. At a minimum, we should make the repr _not_ crash, for example by first checking whether we have CPU data, and if not only printing generic information (the array type or the schema) and not a preview of the data. But, I think we could also do better by actually ensuring the repr works and is informative for non-CPU data as well. For the pretty printing part of the repr, we only need a small subset of the data (by default first and last 5 elements), and copying such portion to the CPU just for printing should generally be fine. If we implement this on the Python side, this depends on exposing the generic CopyTo functionality (https://github.com/apache/arrow/issues/41126) to copy to CPU device. However, we could maybe also implement this on the C++ side in `PrettyPrint` itself? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
