jorisvandenbossche commented on code in PR #40341:
URL: https://github.com/apache/arrow/pull/40341#discussion_r1524478032
##########
python/pyarrow/array.pxi:
##########
@@ -336,11 +338,21 @@ def array(object obj, type=None, mask=None, size=None,
from_pandas=None,
if pandas_api.have_pandas:
values, type = pandas_api.compat.get_datetimetz_type(
values, obj.dtype, type)
- result = _ndarray_to_array(values, mask, type, c_from_pandas, safe,
- pool)
+ if type and type.id == _Type_RUN_END_ENCODED:
+ arr = _ndarray_to_array(
+ values, mask, type.value_type, c_from_pandas, safe, pool)
+ result = run_end_encode(arr, run_end_type=type.run_end_type)
Review Comment:
You can pass the `pool` here as well
##########
python/pyarrow/array.pxi:
##########
@@ -21,6 +21,8 @@ import os
import warnings
from cython import sizeof
+from pyarrow.compute import run_end_encode
Review Comment:
Instead of this import here (which isn't guaranteed to work, as one can
build pyarrow without compute support), you can use `_pc()` to get the compute
module where calling `run_end_encode` (see the other examples of that in this
file)
##########
python/pyarrow/tests/test_array.py:
##########
@@ -3579,6 +3579,42 @@ def test_run_end_encoded_from_buffers():
1, offset, children)
+def test_run_end_encoded_from_array_with_type():
+ run_ends = pa.array([1, 3, 6], type=pa.int32())
+ values = pa.array([1, 2, 3], type=pa.int64())
+
+ ree_type = pa.run_end_encoded(pa.int32(), pa.int64())
+
+ arr = [1, 2, 2, 3, 3, 3]
+ result = pa.array(arr, type=ree_type)
+
+ assert result.run_ends.equals(run_ends)
+ assert result.values.equals(values)
Review Comment:
Instead of each time checking both run ends and values, I think you could
also once created the expected result (`expected =
pa.RunEndEncodedArray.from_arrays(run_ends, values, ree_type)`), and then just
check `assert result.equals(expected)`
##########
python/pyarrow/tests/test_array.py:
##########
@@ -3579,6 +3579,42 @@ def test_run_end_encoded_from_buffers():
1, offset, children)
+def test_run_end_encoded_from_array_with_type():
+ run_ends = pa.array([1, 3, 6], type=pa.int32())
+ values = pa.array([1, 2, 3], type=pa.int64())
+
+ ree_type = pa.run_end_encoded(pa.int32(), pa.int64())
+
+ arr = [1, 2, 2, 3, 3, 3]
+ result = pa.array(arr, type=ree_type)
+
+ assert result.run_ends.equals(run_ends)
+ assert result.values.equals(values)
+
+ result = pa.array(np.array(arr), type=ree_type)
Review Comment:
Can you also do one more version of this with `pa.array(arr)` as input?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]