[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

GitBox Mon, 23 Aug 2021 07:12:45 -0700


andygrove commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r694009745




##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       Sorry to be a pain but this seems confusing to me and maybe it would be 
better to revert to your original code and document that the user can limit 
output by using `df.limit(20).show()`. Alternatively, we could add a 
`show_limit(limit: usize)` alternate method where the user can specify how many 
rows they would like.
   
   The issue I have with this is that it shows 20 rows by default and if the 
user wants more then they need to add a limit, which is counterintuitive 
because limit normally reduces the number of rows. Also, this code only works 
if the final operator is a limit, so it won't work constantly if the limit is 
wrapped in a sort, for example.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Reply via email to