[ 
https://issues.apache.org/jira/browse/ARROW-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382750#comment-16382750
 ] 

ASF GitHub Bot commented on ARROW-2135:
---------------------------------------

cpcloud commented on a change in pull request #1681: ARROW-2135: [Python] Fix 
NaN conversion when casting from Numpy array
URL: https://github.com/apache/arrow/pull/1681#discussion_r171711960
 
 

 ##########
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##########
 @@ -113,6 +145,55 @@ inline int64_t ValuesToBitmap(PyArrayObject* arr, 
uint8_t* bitmap) {
   return null_count;
 }
 
+class NumPyNullsConverter {
+ public:
+  /// Convert the given array's null values to a null bitmap.
+  /// The null bitmap is only allocated if null values are ever possible.
+  static Status Convert(MemoryPool* pool, PyArrayObject* arr,
+                        bool use_pandas_null_sentinels,
+                        std::shared_ptr<ResizableBuffer>* out_null_bitmap_,
+                        int64_t* out_null_count) {
+    NumPyNullsConverter converter(pool, arr, use_pandas_null_sentinels);
+    RETURN_NOT_OK(VisitNumpyArrayInline(arr, &converter));
+    *out_null_bitmap_ = converter.null_bitmap_;
+    *out_null_count = converter.null_count_;
+    return Status::OK();
+  }
+
+  template <int TYPE>
+  Status Visit(PyArrayObject* arr) {
+    typedef internal::npy_traits<TYPE> traits;
+
+    const bool null_sentinels_possible =
+        // Always treat Numpy's NaT as null
+        TYPE == NPY_DATETIME ||
 
 Review comment:
   AFAIU There's no other way to interpret `NaT` other than `NULL` (unless 
there's a standard that defines it in a different way than "missing"). nan is 
part of the IEEE floating point specification (as I'm sure you know) and it has 
a different meaning than null.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] NaN values silently casted to int64 when passing explicit schema for 
> conversion in Table.from_pandas
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-2135
>                 URL: https://issues.apache.org/jira/browse/ARROW-2135
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Matthew Gilbert
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> If you create a {{Table}} from a {{DataFrame}} of ints with a NaN value the 
> NaN is improperly cast. Since pandas casts these to floats, when converted to 
> a table the NaN is interpreted as an integer. This seems like a bug since a 
> known limitation in pandas (the inability to have null valued integers data) 
> is taking precedence over arrow's functionality to store these as an IntArray 
> with nulls.
>  
> {code}
> import pyarrow as pa
> import pandas as pd
> df = pd.DataFrame({"a":[1, 2, pd.np.NaN]})
> schema = pa.schema([pa.field("a", pa.int64(), nullable=True)])
> table = pa.Table.from_pandas(df, schema=schema)
> table[0]
> <pyarrow.lib.Column object at 0x7f2151d19c90>
> chunk 0: <pyarrow.lib.Int64Array object at 0x7f213bf356d8>
> [
>   1,
>   2,
>   -9223372036854775808
> ]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to