jorisvandenbossche commented on code in PR #35656:
URL: https://github.com/apache/arrow/pull/35656#discussion_r1230556348


##########
python/pyarrow/array.pxi:
##########
@@ -721,12 +722,12 @@ cdef class _PandasConvertible(_Weakrefable):
         integer_object_nulls : bool, default False
             Cast integers with nulls to objects
         date_as_object : bool, default True
-            Cast dates to objects. If False, convert to datetime64[ns] dtype.
+            Cast dates to objects. If False, convert to datetime64 dtype.
         timestamp_as_object : bool, default False
             Cast non-nanosecond timestamps (np.datetime64) to objects. This is
             useful if you have timestamps that don't fit in the normal date
             range of nanosecond timestamps (1678 CE-2262 CE).

Review Comment:
   ```suggestion
               range of nanosecond timestamps (1678 CE-2262 CE) with pandas 
version older than 2.0.
   ```
   
   or something like that. I think it would be useful to clarify that this 
keyword now is less important / this range limitation is only relevant for 
older pandas.



##########
python/pyarrow/array.pxi:
##########
@@ -721,12 +722,12 @@ cdef class _PandasConvertible(_Weakrefable):
         integer_object_nulls : bool, default False
             Cast integers with nulls to objects
         date_as_object : bool, default True
-            Cast dates to objects. If False, convert to datetime64[ns] dtype.
+            Cast dates to objects. If False, convert to datetime64 dtype.

Review Comment:
   Do we want to specify this is [ns] for older pandas and [ms] for pandas >= 
2.0?



##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -1199,13 +1223,25 @@ def test_table_convert_date_as_object(self):
 
         table = pa.Table.from_pandas(df, preserve_index=False)
 
-        df_datetime = table.to_pandas(date_as_object=False)
+        df_datetime = table.to_pandas(date_as_object=False,
+                                      coerce_temporal_nanoseconds=coerce_to_ns)
         df_object = table.to_pandas()
 
-        tm.assert_frame_equal(df.astype('datetime64[ns]'), df_datetime,
+        tm.assert_frame_equal(df.astype(expected_type), df_datetime,
                               check_dtype=True)
         tm.assert_frame_equal(df, df_object, check_dtype=True)
 
+    def test_table_coerce_temporal_nanoseconds(self):
+        df = pd.DataFrame({'date': [date(2000, 1, 1)]}, dtype='datetime64[ms]')
+        table = pa.Table.from_pandas(df)

Review Comment:
   We should maybe parametrize this test with using date32/date64/timestamp? Or 
is the date32/64 case tested elsewhere?



##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -1569,7 +1572,31 @@ class DatetimeWriter : public 
TypedPandasWriter<NPY_DATETIME> {
 };
 
 using DatetimeSecondWriter = DatetimeWriter<TimeUnit::SECOND>;
-using DatetimeMilliWriter = DatetimeWriter<TimeUnit::MILLI>;
+
+class DatetimeMilliWriter : public DatetimeWriter<TimeUnit::MILLI> {
+ public:
+  using DatetimeWriter<TimeUnit::MILLI>::DatetimeWriter;
+
+  Status CopyInto(std::shared_ptr<ChunkedArray> data, int64_t rel_placement) 
override {
+    Type::type type = data->type()->id();
+    int64_t* out_values = this->GetBlockColumnStart(rel_placement);
+    if (type == Type::DATE32) {
+      // Convert from days since epoch to datetime64[ms]
+      ConvertDatetimeLikeNanos<int32_t, 86400000L>(*data, out_values);

Review Comment:
   Ah, yes, when milliseconds is the target this is probably fine. For the 
nanoseconds case, though, this gives wrong results. Opened 
https://github.com/apache/arrow/issues/36084 about that.



##########
python/pyarrow/array.pxi:
##########
@@ -775,6 +776,13 @@ cdef class _PandasConvertible(_Weakrefable):
             expected to return a pandas ExtensionDtype or ``None`` if the
             default conversion should be used for that type. If you have
             a dictionary mapping, you can pass ``dict.get`` as function.
+        coerce_temporal_nanoseconds : bool, default False
+            Only applicable to pandas version >= 2.0.
+            A legacy option to coerce date32, date64, datetime, and timestamp

Review Comment:
   ```suggestion
               A legacy option to coerce date32, date64, and timestamp
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to