[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10101: ARROW-9594: [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas

GitBox Thu, 22 Apr 2021 07:18:01 -0700


jorisvandenbossche commented on a change in pull request #10101:
URL: https://github.com/apache/arrow/pull/10101#discussion_r618363682




##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -688,6 +688,69 @@ def test_dictionary_from_numpy():
             assert d2[i].as_py() == dictionary[indices[i]]
 
 
+def test_dictionary_to_numpy():
+    expected = pa.array(
+        ["foo", "bar", None, "foo"]
+    ).to_numpy(zero_copy_only=False)
+    a = pa.DictionaryArray.from_arrays(
+        pa.array([0, 1, None, 0]),
+        pa.array(['foo', 'bar'])
+    )
+    assert (a.to_numpy(zero_copy_only=False) == expected).all()

Review comment:
       I am not sure if we are very consistent about it, but we also use 
`np.testing.assert_array_equal` for asserting numpy arrays

##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -688,6 +688,69 @@ def test_dictionary_from_numpy():
             assert d2[i].as_py() == dictionary[indices[i]]
 
 
+def test_dictionary_to_numpy():
+    expected = pa.array(
+        ["foo", "bar", None, "foo"]
+    ).to_numpy(zero_copy_only=False)
+    a = pa.DictionaryArray.from_arrays(
+        pa.array([0, 1, None, 0]),
+        pa.array(['foo', 'bar'])
+    )
+    assert (a.to_numpy(zero_copy_only=False) == expected).all()
+
+    with pytest.raises(pa.ArrowInvalid):
+        # to_numpy takes for granted that when zero_copy_only=True
+        # there will be no nulls.
+        # If that changes, nulls handling will have to be updated in to_numpy
+        # as we won't be able to rely anymore on decoding the DictionaryArray
+        # to handle nulls.

Review comment:
       I don't fully understand this comment about nulls. For this specific 
case isn't it just testing that conversion to numpy can never be done zero-copy 
for dictionary (nulls or no nulls) and is thus raising an error? (this already 
worked correctly on master as well)

##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -688,6 +688,69 @@ def test_dictionary_from_numpy():
             assert d2[i].as_py() == dictionary[indices[i]]
 
 
+def test_dictionary_to_numpy():
+    expected = pa.array(
+        ["foo", "bar", None, "foo"]
+    ).to_numpy(zero_copy_only=False)
+    a = pa.DictionaryArray.from_arrays(
+        pa.array([0, 1, None, 0]),
+        pa.array(['foo', 'bar'])
+    )
+    assert (a.to_numpy(zero_copy_only=False) == expected).all()
+
+    with pytest.raises(pa.ArrowInvalid):
+        # to_numpy takes for granted that when zero_copy_only=True
+        # there will be no nulls.
+        # If that changes, nulls handling will have to be updated in to_numpy
+        # as we won't be able to rely anymore on decoding the DictionaryArray
+        # to handle nulls.
+        a.to_numpy(zero_copy_only=True)
+
+    anonulls = pa.DictionaryArray.from_arrays(
+        pa.array([0, 1, 1, 0]),
+        pa.array(['foo', 'bar'])
+    )
+    expected = pa.array(
+        ["foo", "bar", "bar", "foo"]
+    ).to_numpy(zero_copy_only=False)

Review comment:
       ```suggestion
       expected = np.array(["foo", "bar", "bar", "foo"], dtype=object)
   ```
   
   I think we can just define the expected here? (it's also a bit simpler in 
code)
   
   (and the same for the others)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10101: ARROW-9594: [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas

Reply via email to