[ 
https://issues.apache.org/jira/browse/ARROW-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603090#comment-17603090
 ] 

Joris Van den Bossche edited comment on ARROW-17683 at 9/12/22 1:34 PM:
------------------------------------------------------------------------

Do you know if this is a recurring failure? The test seems to fail on something 
that already failed in the past as well ({{pa.array(np.array(['', '0', 
'0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'))}} fails with 
released version of pyarrow as well), and is a hypothesis test, so it might be 
a rare error due to the random parametrization of hypothesis. 

https://crossbow.voltrondata.com/ says the test is failing for 4 days, although 
there is only 1 failure link.

(that said, the project doesn't seem actively maintained anymore, so we should 
maybe consider removing or disabling it in our nightly integration builds)


was (Author: jorisvandenbossche):
Do you know if this is a recurring failure? The test seems to fail on something 
that already failed in the past as well ({{pa.array(np.array(['', '0', 
'0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'))}} fails with 
released version of pyarrow as well), and is a hypothesis test, so it might be 
a rare error due to the random parametrization of hypothesis. 

(that said, the project doesn't seem actively maintained anymore, so we should 
maybe consider removing or disabling it in our nightly integration builds)

> [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to 
> UnicodeDecodeError
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17683
>                 URL: https://issues.apache.org/jira/browse/ARROW-17683
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Continuous Integration, Python
>            Reporter: Raúl Cumplido
>            Priority: Major
>              Labels: Nightly
>
> The nightly tests against kartothek are currently failing due to the 
> following error:
> {code:java}
>  ______________________ test_eval_operators[<-1-expected3] 
> ______________________op = '<', value = 1, expected = {'a', 'b', 'c'}    
> @pytest.mark.parametrize(
> >       "op, value, expected",
>         [
>             ("==", 1, {"b", "c", "e"}),
>             ("<=", 1, {"a", "b", "c", "e"}),
>             (">=", 1, {"b", "c", "e", "f"}),
>             ("<", 1, {"a", "b", "c"}),
>             (">", 1, {"f"}),
>             ("in", [0, 2], {"a", "b", "c", "f"}),
>         ],
>     )tests/core/test_index.py:621: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> tests/core/test_index.py:638: in test_eval_operators
>     index_data[2]: ["f"],
> kartothek/core/index.py:614: in __init__
>     normalize_dtype=normalize_dtype,
> kartothek/core/index.py:78: in __init__
>     table = _index_dct_to_table(index_dct, column, None)
> kartothek/core/index.py:949: in _index_dct_to_table
>     labeled_array = pa.array(keys, type=dtype)
> pyarrow/array.pxi:313: in pyarrow.lib.array
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ >   ???
> E   UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 4-7: 
> code point in surrogate code point range(0xd800, 0xe000)
> E   Falsifying example: test_eval_operators(
> E       index_data=array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', 
> '7'], dtype='<U2'),
> E       op='<',
> E       value=1,
> E       expected={'a', 'b', 'c'},
> E   )pyarrow/array.pxi:83: UnicodeDecodeError {code}
> An example of build failure:
> [https://github.com/ursacomputing/crossbow/runs/8296508320?check_suite_focus=true]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to