timsaucer opened a new pull request, #1036: URL: https://github.com/apache/datafusion-python/pull/1036
# Which issue does this PR close? None. # Rationale for this change The notebook rendering of DataFrames is very useful, but it can be enhanced. This PR adds quality of life improvements such as - The table is now scrollable both vertically and horizontally - Instead of collecting an arbitrary 10 rows, we collect up to 2 MB worth of data - For Scalars that render to long strings (25 characters) we limit them down and have a `...` button to allow expanding the cell so you can view it in it's entirety - When we have more data available than is displayed we indicate this to the user that the data are truncated - When there are no data returned, we write this to the user # What changes are included in this PR? This PR adds a feature to collect record batches and uses their size estimate to collect up to 2MB worth of data. This is typically enough for most use cases to review the data, but it is a constant we can update. We determine how many rows to show to the user which is either 2MB worth (record batch will easily have more than this) or at least 20 rows (also up for changing). We then render this as a html table In the rendering we see if the individual cell contains more than 25 characters. If so we show a 25 character snippet of the string representation of the data and a `...` button that has a javascript call to update which data are displayed in the cell. # Are there any user-facing changes? Yes, but not to the API. Any user who uses jupyter notebooks will experience these enhanced tables. See the below screenshots for examples:  <img width="1022" alt="table-views-2" src="https://github.com/user-attachments/assets/3098f9a4-f5a5-4658-a3f5-dd6ba7706e4b" /> <img width="1127" alt="table-views-3" src="https://github.com/user-attachments/assets/c73a6118-75ea-4a40-9e50-2aa5718be03c" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org