Copilot commented on code in PR #39293:
URL: https://github.com/apache/superset/pull/39293#discussion_r3190455883
##########
superset/utils/pandas_postprocessing/compare.py:
##########
@@ -60,9 +60,9 @@ def compare( # pylint: disable=too-many-arguments
return df
for s_col, c_col in zip(source_columns, compare_columns, strict=False):
- s_df = df.loc[:, [s_col]]
+ s_df = df.loc[:, [s_col]].copy()
s_df.rename(columns={s_col: "__intermediate"}, inplace=True)
- c_df = df.loc[:, [c_col]]
+ c_df = df.loc[:, [c_col]].copy()
c_df.rename(columns={c_col: "__intermediate"}, inplace=True)
Review Comment:
Now that `s_df`/`c_df` are explicitly copied, using `rename(...,
inplace=True)` is unnecessary and tends to be discouraged in pandas (it
complicates call chaining and can be less predictable with future pandas
behavior). Consider switching to non-inplace renames (e.g., `s_df =
s_df.rename(...)`) for clarity and consistency.
##########
superset/utils/pandas_postprocessing/boxplot.py:
##########
@@ -125,6 +125,8 @@ def outliers(series: Series) -> set[float]:
# nanpercentile needs numeric values, otherwise the isnan function
# that's used in the underlying function will fail
+ if any(df.dtypes[column] == np.object_ for column in metrics):
+ df = df.copy()
for column in metrics:
if df.dtypes[column] == np.object_:
df[column] = to_numeric(df[column], errors="coerce")
Review Comment:
The dtype check is performed twice (once in `any(...)` and again inside the
loop), which duplicates logic and makes it easier for the conditions to drift
over time. Consider computing the list of object-typed metric columns once
(preferably using `pandas.api.types.is_object_dtype`), then (1) copy only if
that list is non-empty and (2) iterate only over that list for conversion.
##########
superset/utils/pandas_postprocessing/histogram.py:
##########
@@ -54,6 +54,7 @@ def histogram(
return df
# convert to numeric, coercing errors to NaN
+ df = df.copy()
df[column] = to_numeric(df[column], errors="coerce")
Review Comment:
Since this PR’s goal is specifically to eliminate `SettingWithCopyWarning`,
it would be good to add a regression test that exercises these utilities with
an actual slice/view input (e.g., `df = original_df[mask]`) and asserts no
`SettingWithCopyWarning` is emitted during post-processing (via
`warnings.catch_warnings` / pytest warning assertions). This helps prevent
future reintroductions of the warning while keeping output unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]