kosiew opened a new pull request, #1119: URL: https://github.com/apache/datafusion-python/pull/1119
## Which issue does this PR close? partial fix for #1078 ## Rationale for this change This change improves the flexibility and performance of DataFrame rendering in notebooks and other environments. It introduces fine-grained control over memory usage, row display counts, and HTML output optimization, making large data exploration more efficient and user-friendly. It also cleans up validation logic for formatter settings and supports custom styling providers more robustly. ## What changes are included in this PR? - Added `max_memory_bytes`, `min_rows_display`, and `repr_rows` parameters to the DataFrame HTML formatter. - Updated Python `configure_formatter` API and documentation to expose new parameters. - Improved internal validation for formatter parameters (`_validate_positive_int`, `_validate_bool`). - Introduced `FormatterConfig` in Rust to carry display configuration across DataFrame rendering. - Updated Rust `collect_record_batches_to_display` to respect new memory and row limits dynamically. - New tests to cover memory limits, row controls, and style provider usage. - Documentation updates explaining memory and performance optimizations, including `use_shared_styles`. ## Are these changes tested? ✅ Yes, additional tests have been added: - Validation of new parameters in `test_html_formatter_memory_and_rows`. - Verification of custom style provider behavior combined with formatter parameters. - Edge case testing for extreme values (e.g., very high/low limits). ## Are there any user-facing changes? ✅ Yes: - Users can now configure how much memory and how many rows are used when displaying DataFrames. - Improved error messages for invalid formatter configurations. - Better performance when rendering large numbers of DataFrames in Jupyter notebooks or other rich environments. - Documentation updated to reflect the new options available. --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org