mapleFU commented on PR #35351:
URL: https://github.com/apache/arrow/pull/35351#issuecomment-1527884680
I guess we need a mapping layer if we want `arrow::compute::Order`, because
things like unordered would be tricky.
By the way, I think currently it's a bit unsafe to use `set_sorting_order`,
because no checking is executed here. I go through the code of arrow-rs and
parquet-mr. They also do not check the code. Maybe we can use `Statistic` to
check, but it's too expansive?
By the way, ci failed because:
```
=================================== FAILURES
===================================
___________________ test_pandas_assertion_error_large_string
___________________
@pytest.mark.large_memory
@pytest.mark.pandas
def test_pandas_assertion_error_large_string():
# Test AssertionError as pandas does not support "U" type strings
if Version(pd.__version__) < Version("1.5.0"):
pytest.skip("__dataframe__ added to pandas in 1.5.0")
data = np.array([b'x'*1024]*(3*1024**2), dtype='object') # 3GB
bytes data
arr = pa.array(data, type=pa.large_string())
table = pa.table([arr], names=["large_string"])
from pandas.api.interchange import (
from_dataframe as pandas_from_dataframe
)
> with pytest.raises(AssertionError):
E Failed: DID NOT RAISE <class 'AssertionError'>
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:294:
Failed
=============================== warnings summary
===============================
tests/test_pandas.py::TestConvertListTypes::test_to_list_of_maps_pandas
tests/test_pandas.py::TestConvertListTypes::test_to_list_of_maps_pandas_sliced
tests/test_pandas.py::test_roundtrip_nested_map_array_with_pydicts_sliced
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/dtypes/missing.py:571:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences
(which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths
or shapes) is deprecated. If you meant to do this, you must specify
'dtype=object' when creating the ndarray.
if np.any(np.asarray(left_value != right_value)):
tests/test_substrait.py::test_named_table_invalid_table_name
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/unraisableexception.py:78:
PytestUnraisableExceptionWarning: Exception ignored in:
'pyarrow._substrait._create_named_table_provider'
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/test_substrait.py",
line 250, in table_provider
raise Exception("Unrecognized table name")
Exception: Unrecognized table name
warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
tests/parquet/test_datetime.py::test_list_of_datetime_time_roundtrip
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/parquet/test_datetime.py:361:
UserWarning: Could not infer format, so each element will be parsed
individually, falling back to `dateutil`. To ensure parsing is consistent and
as-expected, please specify a format.
times = pd.to_datetime(['09:00', '09:30', '10:00', '10:30', '11:00',
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info
============================
```
No idea why it failed :-(
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]