alamb commented on code in PR #7088:
URL: https://github.com/apache/arrow-datafusion/pull/7088#discussion_r1273630091


##########
datafusion/core/tests/dataframe/mod.rs:
##########
@@ -1035,15 +1039,70 @@ async fn unnest_columns() -> Result<()> {
     // Test aggregate results for points and tags.
     let df = table_with_nested_types(NUM_ROWS).await?;
     let count = df
-        .unnest_column("points")?
-        .unnest_column("tags")?
+        .unnest_column("points", preserve_nulls)?
+        .unnest_column("tags", preserve_nulls)?
         .count()
         .await?;
     assert_eq!(count, results.iter().map(|r| r.num_rows()).sum::<usize>());
 
     Ok(())
 }
 
+#[tokio::test]
+async fn unnest_column_nulls() -> Result<()> {

Review Comment:
   Here is a test that I think shows the difference between the two modes (or 
will once we implement it)



##########
datafusion/core/src/dataframe.rs:
##########
@@ -162,9 +164,10 @@ impl DataFrame {
     /// # Ok(())
     /// # }
     /// ```
-    pub fn unnest_column(self, column: &str) -> Result<DataFrame> {
+    /// [Unnest]: crate::logical_expr::Unnest
+    pub fn unnest_column(self, column: &str, preserve_nulls: bool) -> 
Result<DataFrame> {

Review Comment:
   I think this is different than what @vincev proposed in  
https://github.com/apache/arrow-datafusion/pull/7044#discussion_r1271325311
   
   Specifically, there are three ways to make unnest plans:
   1. `unnest` function
   2. `Dataframe::unnest`
   3. `LogicalPlanBuilder`
   
   It seemed inconsistent to add a `preserve_nulls` argument in only one place
   
   An alternative would be to add new `unest_with_options`, 
`DataFrame::unest_with_options` and `LogicalPlanBuilder::with_options` and 
leave the existing signatures alone.
   
   What do reviewers think?



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1762,7 +1767,47 @@ pub enum Partitioning {
     DistributeBy(Vec<Expr>),
 }
 
-/// Unnest a column that contains a nested list type.
+/// Unnest a column that contains a nested list type, replicating
+/// values in the other, non nested rows.
+///
+/// Conceptually this operation is like joining each row with all the

Review Comment:
   Here is the proposed behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to