[ 
https://issues.apache.org/jira/browse/ARROW-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109551#comment-16109551
 ] 

Wes McKinney commented on ARROW-1306:
-------------------------------------

This is happening because Windows unicode file names have to be encoded to 
UTF16-LE (see 
https://github.com/apache/arrow/blob/master/python/pyarrow/compat.py#L133). 
This definitely should be fixed -- marked for 0.6.0. I'm not sure of the right 
fix without taking a deeper look, since the Arrow file APIs take 
{{std::string}} they aren't aware of the encoded file name when they generate 
the error message. One way to handle it might be to add some kind of auxiliary 
data structure that has both the platform-encoded path and a UTF8 path. On 
Linux/macOS they'll be the same, but we can use the UTF8 version for making 
error messages

> [Python] Encoding? issue with error reporting for parquet.read_table
> --------------------------------------------------------------------
>
>                 Key: ARROW-1306
>                 URL: https://issues.apache.org/jira/browse/ARROW-1306
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.5.0
>            Reporter: Chris Bartak
>             Fix For: 0.6.0
>
>
> This is only error-reporting, somehow the filename in the exception for a not 
> found file is getting garbled, example below
> {code}
> import pyarrow.parquet as pq
> pq.read_table('non_existent_file.parquet')
> ArrowIOError                              Traceback (most recent call last)
> pq.read_table('non_existent_file.parquet')
> ---------------------------------------------------------------------------
> ArrowIOError                              Traceback (most recent call last)
> ----> 1 pq.read_table('non_existent_file.parquet')
> ~\AppData\Local\Continuum\Anaconda3\envs\py36\lib\site-packages\pyarrow\parquet.py
>  in read_table(source, columns, nthreads, metadata, use_pandas_metadata)
>     709                                    metadata=metadata)
>     710 
> --> 711     pf = ParquetFile(source, metadata=metadata)
>     712     return pf.read(columns=columns, nthreads=nthreads,
>     713                    use_pandas_metadata=use_pandas_metadata)
> ~\AppData\Local\Continuum\Anaconda3\envs\py36\lib\site-packages\pyarrow\parquet.py
>  in __init__(self, source, metadata, common_metadata)
>      52     def __init__(self, source, metadata=None, common_metadata=None):
>      53         self.reader = ParquetReader()
> ---> 54         self.reader.open(source, metadata=metadata)
>      55         self.common_metadata = common_metadata
>      56 
> _parquet.pyx in pyarrow._parquet.ParquetReader.open()
> io.pxi in pyarrow.lib.get_reader()
> io.pxi in pyarrow.lib.memory_map()
> io.pxi in pyarrow.lib.MemoryMappedFile._open()
> error.pxi in pyarrow.lib.check_status()
> ArrowIOError: IOError: Failed to open file: 潮彮硥獩整瑮晟汩⹥慰煲敵
> {code}
> verions - Python 3.6 Windows x64
> {code}
>     arrow-cpp:   0.5.0-np112py36_vc14_1 conda-forge [vc14]
>     parquet-cpp: 1.2.0.pre-vc14_3       conda-forge [vc14]
>     pyarrow:     0.5.0-np112py36_vc14_0 conda-forge [vc14]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to