amol- commented on a change in pull request #10101:
URL: https://github.com/apache/arrow/pull/10101#discussion_r616743857



##########
File path: python/pyarrow/array.pxi
##########
@@ -1170,7 +1170,13 @@ cdef class Array(_PandasConvertible):
         array = PyObject_to_object(out)
 
         if isinstance(array, dict):
-            array = np.take(array['dictionary'], array['indices'])
+            if zero_copy_only or not self.null_count:
+                # zero_copy doesn't allow for nulls to be in the array
+                array = np.take(array['dictionary'], array['indices'])
+            else:
+                missings = array["indices"] < 0
+                array = np.take(array['dictionary'], array['indices'])
+                array[missings] = np.NaN

Review comment:
       Thanks for catching the `np.NaN` with ints, I actually noticed that 
yesterday but forgot to deal with it when I got back to this ticket.
   
   Regarding the `None` vs `np.NaN`.
   I think there is an inconsistency already, because I copied the behaviour 
from `to_pandas` and it seems that `to_pandas` already leads to two different 
results when used on `DictionaryArray`  or `Array`:
   
   ```
   >>> pa.array(['a', None]).to_pandas().tolist()
   ['a', None]
   >>> pa.array(['a', None]).to_numpy(zero_copy_only=False).tolist()
   ['a', None]
   ```
   
   VS
   
   ```
   >>> pa.DictionaryArray.from_arrays(pa.array([0, None]), 
pa.array(['a'])).to_pandas().tolist()
   ['a', nan]
   ```
   
   So it seems that for `DictionaryArray` we already used `NaN`, not `None`, 
which lead to the reason why I used `NaN`.
   
   Should we uniform the behaviour and switch to `None` for 
`DictionaryArray.to_pandas` too as we do for `Array.to_pandas`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to