Spaarsh commented on issue #1026: URL: https://github.com/apache/datafusion-python/issues/1026#issuecomment-2689730430
I'd like to work on this issue. Adding a few lines of code along the lines of: ``` fn __repr__(&self, py: Python) -> PyDataFusionResult<String> { let df = self.df.as_ref().clone().limit(0, Some(11))?; let batches = wait_for_future(py, df.collect())?; let num_rows = batches.iter().map(|batch| batch.num_rows()).sum::<usize>(); let limited_batches = batches.iter().take(10).cloned().collect::<Vec<_>>(); let batches_as_string = pretty::pretty_format_batches(&limited_batches); match batches_as_string { Ok(batch) => { if num_rows > 10 { Ok(format!("DataFrame()\n{batch}\nand more...")) } else { Ok(format!("DataFrame()\n{batch}")) } } Err(err) => Ok(format!("Error: {:?}", err.to_string())), } } ``` Should suffice, I suppose? > You could also implement a "config" system like pandas uses, so the user can opt-in to displaying more columns or rows https://pandas.pydata.org/docs/user_guide/options.html#overview As for the config, we'd need to decide on a particular format. I would suggest ```toml``` since it is used by ```Cargo```. But that in itself requires a new issue since I am sure there can be a host of other things that could benefit from this system. We could start from this issue itself too if it is alright. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org