[ 
https://issues.apache.org/jira/browse/ARROW-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036395#comment-16036395
 ] 

Jeff Knupp commented on ARROW-1088:
-----------------------------------

Well, the issue is that functions like `os.stat` and `os.remove` are indirectly 
called on the file (which, of course, can be any bytes for the name itself 
within Python). Below is the traceback output:

{code}
root@183bbfdb623a:/home/ubuntu/arrow/arrow/python# py.test pyarrow
==================================================================== test 
session starts 
====================================================================
platform linux -- Python 3.5.3, pytest-3.1.1, py-1.4.33, pluggy-0.4.0
rootdir: /home/ubuntu/arrow/arrow/python, inifile: setup.cfg
collected 185 items / 2 skipped

pyarrow/tests/test_array.py ...........
pyarrow/tests/test_convert_builtin.py ......................
pyarrow/tests/test_convert_pandas.py ............................x....
pyarrow/tests/test_deprecations.py ..
pyarrow/tests/test_feather.py .......................x..FE.
pyarrow/tests/test_io.py ..................
pyarrow/tests/test_ipc.py .............x
pyarrow/tests/test_jemalloc.py ..
pyarrow/tests/test_scalars.py ..........
pyarrow/tests/test_schema.py ..............
pyarrow/tests/test_table.py ...............
pyarrow/tests/test_tensor.py ................

========================================================================== 
ERRORS 
===========================================================================
_______________________________________________ ERROR at teardown of 
TestFeatherReader.test_unicode_filename 
________________________________________________

self = <pyarrow.tests.test_feather.TestFeatherReader 
testMethod=test_unicode_filename>

    def tearDown(self):
        for path in self.test_files:
            try:
>               os.remove(path)
E               UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' 
in position 10: ordinal not in range(128)

pyarrow/tests/test_feather.py:45: UnicodeEncodeError
========================================================================= 
FAILURES 
==========================================================================
__________________________________________________________ 
TestFeatherReader.test_unicode_filename 
__________________________________________________________

self = <pyarrow.tests.test_feather.TestFeatherReader 
testMethod=test_unicode_filename>

    def test_unicode_filename(self):
        # GH #209
        name = (b'Besa_Kavaj\xc3\xab.feather').decode('utf-8')
        df = pd.DataFrame({'foo': [1, 2, 3, 4]})
>       self._check_pandas_roundtrip(df, path=name)

pyarrow/tests/test_feather.py:362:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/tests/test_feather.py:71: in _check_pandas_roundtrip
    if not os.path.exists(path):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = 'Besa_Kavaj\xeb.feather'

    def exists(path):
        """Test whether a path exists.  Returns False for broken symbolic 
links"""
        try:
>           os.stat(path)
E           UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' in 
position 10: ordinal not in range(128)

/usr/lib/python3.5/genericpath.py:19: UnicodeEncodeError
{code}

> [Python] test_unicode_filename test fails when unicode filenames aren't 
> supported by system
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1088
>                 URL: https://issues.apache.org/jira/browse/ARROW-1088
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Jeff Knupp
>
> Building/running pyarrow in Docker using Ubuntu 17.04 as a base (with no 
> other modification) fails {{test_unicode_filename}} as unicode filenames are 
> apparently not supported by default in this setup. This is further confirmed 
> by the value of {{os.path.supports_unicode_filenames = False}}. This test 
> should be skipped in such situations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to