Nagato-Yuzuru commented on code in PR #22702:
URL: https://github.com/apache/datafusion/pull/22702#discussion_r3342615087


##########
datafusion/core/src/dataframe/mod.rs:
##########
@@ -2527,6 +2528,78 @@ impl DataFrame {
             .collect()
     }
 
+    /// Fill NaN values in specified columns with a given value
+    /// If no columns are specified (empty vector), applies to all columns
+    /// Only fills if the value can be cast to the column's type
+    ///
+    /// # Arguments
+    /// * `value` - Value to fill NaNs with
+    /// * `columns` - List of column names to fill. If empty, fills all 
columns.
+    ///
+    /// # Example
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # use datafusion_common::ScalarValue;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let ctx = SessionContext::new();
+    /// let df = ctx
+    ///     .read_csv("tests/data/example.csv", CsvReadOptions::new())
+    ///     .await?;
+    /// // Fill NaN in only columns "a" and "c":
+    /// let df = df.fill_nan(ScalarValue::from(0.0), vec!["a".to_owned(), 
"c".to_owned()])?;
+    /// // Fill NaN across all columns:
+    /// let df = df.fill_nan(ScalarValue::from(0.0), vec![])?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    #[expect(clippy::needless_pass_by_value)]
+    pub fn fill_nan(
+        &self,
+        value: ScalarValue,
+        columns: Vec<String>,
+    ) -> Result<DataFrame> {

Review Comment:
   Thanks review. I modeled `fill_nan` on the existing 
[fill_null](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.fill_null),
 which also takes Vec<String>, so changing just `fill_nan` would leave the two 
siblings inconsistent. 
   
   I agree with aligning better, but changing fill_null's signature is a 
breaking(e.g. `df.fill_null(val, vec!["a".to_string()])`). So maybe we not 
bundle it into this PR? Could keep fill_nan matching fill_null here and migrate 
both in a follow-up? 
   
   Also unnest_columns uses `&[&str]` while drop_columns uses `&[impl 
Into<Column>]`. Do we have any preference which to standardize on?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to