korbit-ai[bot] commented on code in PR #35586:
URL: https://github.com/apache/superset/pull/35586#discussion_r2417765814
##########
superset/commands/database/uploaders/csv_reader.py:
##########
@@ -326,6 +326,26 @@ def _cast_column_types(
CSVReader._cast_single_column(df, column, dtype, kwargs)
return df
+ @staticmethod
+ def _split_types(types: dict[str, str]) -> tuple[dict[str, str], dict[str,
str]]:
+ """
+ Split column data types into custom and pandas-native types.
+
+ :param types: Dictionary mapping column names to data types
+ :return: Tuple of (custom_types, pandas_types) dictionaries
+ """
+ pandas_types = {
+ col: dtype
+ for col, dtype in types.items()
+ if dtype in ("str", "object", "string")
+ }
Review Comment:
### Incomplete pandas-native type classification <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The _split_types method incorrectly categorizes pandas-native types by only
including string-like types ('str', 'object', 'string'), but excludes numeric
types like 'int64', 'float64', 'int32', 'float32' which are also pandas-native
types that should be handled by pandas directly.
###### Why this matters
This will cause all numeric types to be treated as custom types requiring
manual casting, which defeats the purpose of leveraging pandas' native dtype
handling for better performance and reliability. Scientific notation numbers
(e+) that could be handled natively by pandas will unnecessarily go through
custom casting.
###### Suggested change ∙ *Feature Preview*
Include numeric types in the pandas_types classification:
```python
pandas_native_types = {"str", "object", "string", "int64", "int32",
"float64", "float32", "bool"}
pandas_types = {
col: dtype
for col, dtype in types.items()
if dtype in pandas_native_types
}
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/76f8c6a1-504a-4312-b6c1-1dbe7ba2047a/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/76f8c6a1-504a-4312-b6c1-1dbe7ba2047a?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/76f8c6a1-504a-4312-b6c1-1dbe7ba2047a?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/76f8c6a1-504a-4312-b6c1-1dbe7ba2047a?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/76f8c6a1-504a-4312-b6c1-1dbe7ba2047a)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:8b26650e-8a28-401a-8de3-ed1ebaefd04e -->
[](8b26650e-8a28-401a-8de3-ed1ebaefd04e)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]