jorisvandenbossche commented on code in PR #35656:
URL: https://github.com/apache/arrow/pull/35656#discussion_r1205199261


##########
python/pyarrow/types.pxi:
##########
@@ -1122,6 +1150,19 @@ cdef class DurationType(DataType):
         """
         return timeunit_to_string(self.duration_type.unit())
 
+    def to_pandas_dtype(self):
+        """
+        Return the equivalent NumPy / Pandas dtype.
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> d = pa.duration('ms')
+        >>> d.to_pandas_dtype()
+        timedelta64[ms]
+        """
+        return _get_pandas_type(_Type_TIMESTAMP, self.unit)

Review Comment:
   Should this use _Type_DURATION instead? 
   (which also raises the question: is this tested? Although I suppose you 
haven't yet bothered with updating all the tests)



##########
python/pyarrow/types.pxi:
##########
@@ -40,10 +42,20 @@ cdef dict _pandas_type_map = {
     _Type_HALF_FLOAT: np.float16,
     _Type_FLOAT: np.float32,
     _Type_DOUBLE: np.float64,
-    _Type_DATE32: np.dtype('datetime64[ns]'),
-    _Type_DATE64: np.dtype('datetime64[ns]'),
-    _Type_TIMESTAMP: np.dtype('datetime64[ns]'),
-    _Type_DURATION: np.dtype('timedelta64[ns]'),
+    _Type_DATE32: np.dtype('datetime64[D]'),

Review Comment:
   pandas only supports the range of second to nanoseconds, so for dates we 
should maybe still default to `datetime64[s]`? (otherwise I assume this 
conversion would happen anyway on the pandas side)



##########
python/pyarrow/array.pxi:
##########
@@ -1674,8 +1682,11 @@ cdef _array_like_to_pandas(obj, options, types_mapper):
     original_type = obj.type
     name = obj._name
 
-    # ARROW-3789(wesm): Convert date/timestamp types to datetime64[ns]
-    c_options.coerce_temporal_nanoseconds = True
+    # ARROW-33321 reenables support for date/timestamp conversion in pandas >= 
2.0
+    from pyarrow.vendored.version import Version
+    if pandas_api.loose_version < Version('2.0.0'):
+        # ARROW-3789(wesm): Convert date/timestamp types to datetime64[ns]
+        c_options.coerce_temporal_nanoseconds = True

Review Comment:
   Should this only be set of the argument was _not_ specified by the user? Or 
are we fine with forcing this in case of pandas<2.0, since that's what we did 
in the past anyway, and is also the only useful behaviour (letting the user 
specify `coerce_temporal_nanoseconds=False` in a conversion to pandas with 
pandas<2.0 doesn't make much sense, I suppose, since then pandas will convert 
it to nanoseconds anyhow, and the user never sees the non-nanosecond data). So 
after writing this: I assume the above is fine ;)



##########
python/pyarrow/array.pxi:
##########
@@ -775,6 +776,11 @@ cdef class _PandasConvertible(_Weakrefable):
             expected to return a pandas ExtensionDtype or ``None`` if the
             default conversion should be used for that type. If you have
             a dictionary mapping, you can pass ``dict.get`` as function.
+        coerce_temporal_nanoseconds : bool, default False
+            A legacy option to coerce date32, date64, datetime, and timestamp
+            types to use nanoseconds when converting to pandas. This was the
+            default behavior in pandas version 1.x. In pandas version 2.0,
+            non-nanosecond time units are now supported.

Review Comment:
   We should probably clarify that this keyword is ignored for conversions to 
pandas with pandas<2.0, and force set to True (regardless of what the user 
would specify here)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to