Spaarsh commented on PR #1041: URL: https://github.com/apache/datafusion-python/pull/1041#issuecomment-2702344410
> We have 3 PRs that are all impacting the `__repr__` and `_repr_html_`. We have: > > * This one which does the additional data checking with a collect() > * [refactor: collect dataframe as stream in `__repr__` #1015](https://github.com/apache/datafusion-python/pull/1015) which collects until we get to 10 rows > * [Scrollable python notebook table rendering #1036](https://github.com/apache/datafusion-python/pull/1036) which collects 2MB or 20 rows but just for the html rendering > > I suggest we consolidate. My proposal is: > > * we merge in [refactor: collect dataframe as stream in `__repr__` #1015](https://github.com/apache/datafusion-python/pull/1015) as it is > * I update [Scrollable python notebook table rendering #1036](https://github.com/apache/datafusion-python/pull/1036) to combine the collecting operations to be either by minimum number of rows or data size > * We close [_repr_ and _html_repr_ show '... and additional rows' message #1041](https://github.com/apache/datafusion-python/pull/1041) in favor of the truncation message from 1036 (I'll add it to `__repr__` also. > > Does this sound reasonable? > > Also, its incredible to have so many people pitching in at the same time. I will try to spend some time this weekend to organize some of the open issues to make it easier to not duplicate effort. It sounds reasonable except there's one problem. As suggested by @kosiew in this [comment](), there maybe a need to change how _repr_html_ limits the rows to be printed. Either we merge the notebook PR first and discuss the code change here or you could implement these suggestions in that PR itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org