Re: [PR] fix(pandas): resolve SettingWithCopyWarning across post-processing utils [superset]

via GitHub Tue, 05 May 2026 10:48:08 -0700


Copilot commented on code in PR #39293:
URL: https://github.com/apache/superset/pull/39293#discussion_r3190455883



##########
superset/utils/pandas_postprocessing/compare.py:
##########
@@ -60,9 +60,9 @@ def compare(  # pylint: disable=too-many-arguments
         return df
 
     for s_col, c_col in zip(source_columns, compare_columns, strict=False):
-        s_df = df.loc[:, [s_col]]
+        s_df = df.loc[:, [s_col]].copy()
         s_df.rename(columns={s_col: "__intermediate"}, inplace=True)
-        c_df = df.loc[:, [c_col]]
+        c_df = df.loc[:, [c_col]].copy()
         c_df.rename(columns={c_col: "__intermediate"}, inplace=True)

Review Comment:
   Now that `s_df`/`c_df` are explicitly copied, using `rename(..., 
inplace=True)` is unnecessary and tends to be discouraged in pandas (it 
complicates call chaining and can be less predictable with future pandas 
behavior). Consider switching to non-inplace renames (e.g., `s_df = 
s_df.rename(...)`) for clarity and consistency.



##########
superset/utils/pandas_postprocessing/boxplot.py:
##########
@@ -125,6 +125,8 @@ def outliers(series: Series) -> set[float]:
 
     # nanpercentile needs numeric values, otherwise the isnan function
     # that's used in the underlying function will fail
+    if any(df.dtypes[column] == np.object_ for column in metrics):
+        df = df.copy()
     for column in metrics:
         if df.dtypes[column] == np.object_:
             df[column] = to_numeric(df[column], errors="coerce")

Review Comment:
   The dtype check is performed twice (once in `any(...)` and again inside the 
loop), which duplicates logic and makes it easier for the conditions to drift 
over time. Consider computing the list of object-typed metric columns once 
(preferably using `pandas.api.types.is_object_dtype`), then (1) copy only if 
that list is non-empty and (2) iterate only over that list for conversion.



##########
superset/utils/pandas_postprocessing/histogram.py:
##########
@@ -54,6 +54,7 @@ def histogram(
         return df
 
     # convert to numeric, coercing errors to NaN
+    df = df.copy()
     df[column] = to_numeric(df[column], errors="coerce")

Review Comment:
   Since this PR’s goal is specifically to eliminate `SettingWithCopyWarning`, 
it would be good to add a regression test that exercises these utilities with 
an actual slice/view input (e.g., `df = original_df[mask]`) and asserts no 
`SettingWithCopyWarning` is emitted during post-processing (via 
`warnings.catch_warnings` / pytest warning assertions). This helps prevent 
future reintroductions of the warning while keeping output unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(pandas): resolve SettingWithCopyWarning across post-processing utils [superset]

Reply via email to