jorisvandenbossche commented on pull request #12091: URL: https://github.com/apache/arrow/pull/12091#issuecomment-1016192978
I was testing this locally, I think we might actually eventually want both this and some character cap as in https://github.com/apache/arrow/pull/12148 (but maybe at the "scalar level"). The data I was testing with is a large table with geometry data. For the normal columns (some ints and strings) this PR is a nice improvement. But the geometry column basically consists of a binary column with big blobs (individual scalar values of the column are for this specific dataset up to a length of 8000). And so even when only printing 10 values of this, that still floods the console. So long term (not necessarily for 7.0 though :)), we might want (in addition to this PR) to limit the max size of the repr for individual scalars? (when printed in a table, not when printed as individual scalar) Instead of limiting the max size of the column as you do in https://github.com/apache/arrow/pull/12148 (limiting it per scalar might also be easier, because then you don't get the complexities around truncating the column repr nicely between scalars, as discussed in https://github.com/apache/arrow/pull/12148#discussion_r784329846 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
