codeant-ai-for-open-source[bot] commented on PR #36605: URL: https://github.com/apache/superset/pull/36605#issuecomment-3653090044
## Nitpicks 🔍 <table> <tr><td>🔒 <strong>No security issues identified</strong></td></tr> <tr><td>⚡ <strong>Recommended areas for review</strong><br><br> - [ ] <a href='https://github.com/apache/superset/pull/36605/files#diff-4fbc6dbbdd06680aff582fe2a2c59444368ada935b1af37c0646156d638a614aR126-R131'><strong>Unnecessary conversions / better numeric check</strong></a><br>The code converts only when the dtype equals object; there are other non-numeric dtypes (e.g., category, string) that may still need coercion for `nanpercentile`. Consider using `pandas.api.types.is_numeric_dtype` to skip already-numeric columns and convert all non-numeric types.<br> - [ ] <a href='https://github.com/apache/superset/pull/36605/files#diff-4fbc6dbbdd06680aff582fe2a2c59444368ada935b1af37c0646156d638a614aR128-R131'><strong>Missing column guard</strong></a><br>The loop iterates over `metrics` and indexes `df.dtypes[column]` / `df[column]` without checking that `column` exists in `df`. If `metrics` contains a name not present in the DataFrame this will raise a KeyError. Add an existence check before accessing the column.<br> - [ ] <a href='https://github.com/apache/superset/pull/36605/files#diff-4fbc6dbbdd06680aff582fe2a2c59444368ada935b1af37c0646156d638a614aR128-R130'><strong>Deprecated dtype check</strong></a><br>The code checks `df.dtypes[column] == np.object_`, which relies on `np.object_` (deprecated in recent NumPy). This can cause warnings or break with newer NumPy versions. Prefer using pandas type-check helpers (e.g., `pandas.api.types.is_object_dtype` or `is_numeric_dtype`) or `object`/`pd.Series.dtype` comparisons.<br> - [ ] <a href='https://github.com/apache/superset/pull/36605/files#diff-d51ae5c7b4b86566f46f9da12fdfe1e2975a390ac0b1c650571aa559b21fce25R56-R61'><strong>Coercion -> ValueError behaviour</strong></a><br>`to_numeric(..., errors="coerce")` will turn non-numeric values into NaN which immediately triggers the following `if df[column].isna().any(): raise ValueError(...)`. Verify that raising here is the intended behaviour for downstream callers (vs. dropping/coercing those rows or returning an empty histogram).<br> </td></tr> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
