Spaarsh commented on PR #1041:
URL: 
https://github.com/apache/datafusion-python/pull/1041#issuecomment-2702344410

   > We have 3 PRs that are all impacting the `__repr__` and `_repr_html_`. We 
have:
   > 
   > * This one which does the additional data checking with a collect()
   > * [refactor: collect dataframe as stream in `__repr__` 
#1015](https://github.com/apache/datafusion-python/pull/1015) which collects 
until we get to 10 rows
   > * [Scrollable python notebook table rendering 
#1036](https://github.com/apache/datafusion-python/pull/1036) which collects 
2MB or 20 rows but just for the html rendering
   > 
   > I suggest we consolidate. My proposal is:
   > 
   > * we merge in [refactor: collect dataframe as stream in `__repr__` 
#1015](https://github.com/apache/datafusion-python/pull/1015) as it is
   > * I update [Scrollable python notebook table rendering 
#1036](https://github.com/apache/datafusion-python/pull/1036) to combine the 
collecting operations to be either by minimum number of rows or data size
   > * We close [_repr_ and _html_repr_ show '... and additional rows' message 
#1041](https://github.com/apache/datafusion-python/pull/1041) in favor of the 
truncation message from 1036 (I'll add it to `__repr__` also.
   > 
   > Does this sound reasonable?
   > 
   > Also, its incredible to have so many people pitching in at the same time. 
I will try to spend some time this weekend to organize some of the open issues 
to make it easier to not duplicate effort.
   
   It sounds reasonable except there's one problem. As suggested by @kosiew in 
this [comment](), there maybe a need to change how _repr_html_ limits the rows 
to be printed. Either we merge the notebook PR first and discuss the code 
change here or you could implement these suggestions in that PR itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to