[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430574#comment-16430574
 ] 

ASF GitHub Bot commented on ARROW-2391:
---------------------------------------

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180106203
 
 

 ##########
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##########
 @@ -396,21 +396,34 @@ struct CastFunctor<Date64Type, TimestampType> {
     ShiftTime<int64_t, int64_t>(ctx, options, conversion.first, 
conversion.second, input,
                                 output);
 
-    internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-                                      input.length);
+    if (input.null_count != 0) {
+      internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+                                        input.length);
 
-    // Ensure that intraday milliseconds have been zeroed out
-    auto out_data = GetMutableValues<int64_t>(output, 1);
-    for (int64_t i = 0; i < input.length; ++i) {
-      const int64_t remainder = out_data[i] % kMillisecondsInDay;
-      if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-                              remainder > 0)) {
-        ctx->SetStatus(
-            Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-        break;
+      // Ensure that intraday milliseconds have been zeroed out
+      auto out_data = GetMutableValues<int64_t>(output, 1);
+      for (int64_t i = 0; i < input.length; ++i) {
+        const int64_t remainder = out_data[i] % kMillisecondsInDay;
+        if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+                                remainder > 0)) {
+          ctx->SetStatus(
+              Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+          break;
+        }
+        out_data[i] -= remainder;
+        bit_reader.Next();
+      }
+    } else {
+      auto out_data = GetMutableValues<int64_t>(output, 1);
+      for (int64_t i = 0; i < input.length; ++i) {
+        const int64_t remainder = out_data[i] % kMillisecondsInDay;
+        if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   I might misunderstand, but:
   
   ```python
   # with allow_time_truncate
   [
       '2018-05-10T00:00:00',
       '2018-05-11T00:00:00',
       '2018-05-12T10:24:01',
   ]  # OK
   
   # without allow_time_truncate
   [
       '2018-05-10T00:00:00',
       '2018-05-11T00:00:00',
       '2018-05-12T10:24:01',  # <- fails here
   ]  
   
   # with allow_time_truncate
   [
       '2018-05-10T00:00:00',
       '2018-05-11T00:00:00',
       '2018-05-12T00:00:00',
   ]  # OK
   
   # without allow_time_truncate
   [
       '2018-05-10T00:00:00',
       '2018-05-11T00:00:00',
       '2018-05-12T00:00:00',
   ]  # OK - this would fail if I test outside the loop
   
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> ----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-2391
>                 URL: https://issues.apache.org/jira/browse/ARROW-2391
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>         Environment: Mac OS High Sierra
> Python 3.6
>            Reporter: Dave Challis
>            Priority: Major
>              Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to