jorisvandenbossche commented on pull request #12091:
URL: https://github.com/apache/arrow/pull/12091#issuecomment-1016192978


   I was testing this locally, I think we might actually eventually want both 
this and some character cap as in https://github.com/apache/arrow/pull/12148 
(but maybe at the "scalar level"). 
   
   The data I was testing with is a large table with geometry data. For the 
normal columns (some ints and strings) this PR is a nice improvement. But the 
geometry column basically consists of a binary column with big blobs 
(individual scalar values of the column are for this specific dataset up to a 
length of 8000). And so even when only printing 10 values of this, that still 
floods the console. 
   
   So long term (not necessarily for 7.0 though :)), we might want (in addition 
to this PR) to limit the max size of the repr for individual scalars? (when 
printed in a table, not when printed as individual scalar) Instead of limiting 
the max size of the column as you do in 
https://github.com/apache/arrow/pull/12148 (limiting it per scalar might also 
be easier, because then you don't get the complexities around truncating the 
column repr nicely between scalars, as discussed in 
https://github.com/apache/arrow/pull/12148#discussion_r784329846 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to