[GitHub] [arrow] mapleFU commented on pull request #35351: GH-35331: [C++][Parquet] Parquet Export Footer metadata SortColumns

via GitHub Fri, 28 Apr 2023 10:38:45 -0700


mapleFU commented on PR #35351:
URL: https://github.com/apache/arrow/pull/35351#issuecomment-1527884680


   I guess we need a mapping layer if we want `arrow::compute::Order`, because 
things like unordered would be tricky.
   
   By the way, I think currently it's a bit unsafe to use `set_sorting_order`, 
because no checking is executed here. I go through the code of arrow-rs and 
parquet-mr. They also do not check the code. Maybe we can use `Statistic` to 
check, but it's too expansive?
   
   By the way, ci failed because:
   
   ```
   =================================== FAILURES 
===================================
   ___________________ test_pandas_assertion_error_large_string 
___________________
   
       @pytest.mark.large_memory
       @pytest.mark.pandas
       def test_pandas_assertion_error_large_string():
           # Test AssertionError as pandas does not support "U" type strings
           if Version(pd.__version__) < Version("1.5.0"):
               pytest.skip("__dataframe__ added to pandas in 1.5.0")
       
           data = np.array([b'x'*1024]*(3*1024**2), dtype='object')  # 3GB 
bytes data
           arr = pa.array(data, type=pa.large_string())
           table = pa.table([arr], names=["large_string"])
       
           from pandas.api.interchange import (
               from_dataframe as pandas_from_dataframe
           )
       
   >       with pytest.raises(AssertionError):
   E       Failed: DID NOT RAISE <class 'AssertionError'>
   
   
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:294:
 Failed
   =============================== warnings summary 
===============================
   tests/test_pandas.py::TestConvertListTypes::test_to_list_of_maps_pandas
   
tests/test_pandas.py::TestConvertListTypes::test_to_list_of_maps_pandas_sliced
   tests/test_pandas.py::test_roundtrip_nested_map_array_with_pydicts_sliced
     
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/dtypes/missing.py:571:
 VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences 
(which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths 
or shapes) is deprecated. If you meant to do this, you must specify 
'dtype=object' when creating the ndarray.
       if np.any(np.asarray(left_value != right_value)):
   
   tests/test_substrait.py::test_named_table_invalid_table_name
     
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/unraisableexception.py:78:
 PytestUnraisableExceptionWarning: Exception ignored in: 
'pyarrow._substrait._create_named_table_provider'
     
     Traceback (most recent call last):
       File 
"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/test_substrait.py",
 line 250, in table_provider
         raise Exception("Unrecognized table name")
     Exception: Unrecognized table name
     
       warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
   
   tests/parquet/test_datetime.py::test_list_of_datetime_time_roundtrip
     
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/parquet/test_datetime.py:361:
 UserWarning: Could not infer format, so each element will be parsed 
individually, falling back to `dateutil`. To ensure parsing is consistent and 
as-expected, please specify a format.
       times = pd.to_datetime(['09:00', '09:30', '10:00', '10:30', '11:00',
   
   -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
   =========================== short test summary info 
============================
   ```
   
   No idea why it failed :-(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mapleFU commented on pull request #35351: GH-35331: [C++][Parquet] Parquet Export Footer metadata SortColumns

Reply via email to