raulcd commented on code in PR #44195:
URL: https://github.com/apache/arrow/pull/44195#discussion_r1840404637


##########
dev/tasks/tasks.yml:
##########
@@ -1582,6 +1582,10 @@ tasks:
         # ensure we have at least one build with parquet encryption disabled
         PARQUET_REQUIRE_ENCRYPTION: "OFF"
       {% endif %}
+      {% if pandas_version == "nightly" %}
+        # TODO can be removed once this is enabled by default in pandas >= 3

Review Comment:
   I was confused on where we were using this and realized is a Pandas thing, 
maybe we can point it on the comment?
   ```suggestion
           # TODO can be removed once this is enabled by default in pandas >= 3
           # This is to enable the Pandas feature.
           # See: https://github.com/pandas-dev/pandas/pull/58459



##########
python/pyarrow/pandas_compat.py:
##########
@@ -842,12 +844,25 @@ def _get_extension_dtypes(table, columns_metadata, 
types_mapper=None):
     and then we can check if this dtype supports conversion from arrow.
 
     """
+    strings_to_categorical = options["strings_to_categorical"]
+    categories = categories or []
+
     ext_columns = {}
 
     # older pandas version that does not yet support extension dtypes
     if _pandas_api.extension_dtype is None:
         return ext_columns
 
+    # for pandas 3.0+, use pandas' new default string dtype
+    if _pandas_api.uses_string_dtype() and not strings_to_categorical:
+        for field in table.schema:
+            if (
+                pa.types.is_string(field.type)
+                or pa.types.is_large_string(field.type)
+                or pa.types.is_string_view(field.type)
+            ) and field.name not in categories:

Review Comment:
   I am curious on how were categories interpreted before inferring the new 
string type, was this just not taken into account on the arrow side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to