[ 
https://issues.apache.org/jira/browse/ARROW-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762321#comment-16762321
 ] 

Michael Peleshenko commented on ARROW-4480:
-------------------------------------------

We ran into this issue as well, and a coworker dug into it and narrowed it down 
to {{pyarrow.filesystem.resolve_filesystem_and_path()}}. The very last return 
always returns {{parsed_uri.path}}, regardless of the schema. It seems for the 
non-hdfs schema, just {{path}} should be returned, as otherwise the drive 
letter seems to be stripped by {{urlparse()}}. A workaround for now is to call 
{{pyarrow.parquet.write_table()}} with 
{{filesystem=LocalFileSystem.get_instance()}}
{code}
def resolve_filesystem_and_path(where, filesystem=None):
    """
    return filesystem from path which could be an HDFS URI
    """
    if not _is_path_like(where):
        if filesystem is not None:
            raise ValueError("filesystem passed but where is file-like, so"
                             " there is nothing to open with filesystem.")
        return filesystem, where

    # input can be hdfs URI such as hdfs://host:port/myfile.parquet
    path = _stringify_path(where)

    if filesystem is not None:
        return _ensure_filesystem(filesystem), path

    parsed_uri = urlparse(path)
    if parsed_uri.scheme == 'hdfs':
        netloc_split = parsed_uri.netloc.split(':')
        host = netloc_split[0]
        if host == '':
            host = 'default'
        port = 0
        if len(netloc_split) == 2 and netloc_split[1].isnumeric():
            port = int(netloc_split[1])
        fs = pa.hdfs.connect(host=host, port=port)
    else:
        fs = LocalFileSystem.get_instance()

    return fs, parsed_uri.path
{code}

> [Python] Drive letter removed when writing parquet file 
> --------------------------------------------------------
>
>                 Key: ARROW-4480
>                 URL: https://issues.apache.org/jira/browse/ARROW-4480
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.0
>            Reporter: Seb Fru
>            Priority: Major
>              Labels: parquet
>             Fix For: 0.13.0
>
>
> Hi everyone,
>   
>  importing this from Github:
>   
>  I encountered a problem while working with pyarrow: I am working on Windows 
> 10. When I want to save a table using pq.write_table(tab, 
> r'E:\parquetfiles\file1.parquet'), I get the Error "No such file or 
> directory".
>   After searching a bit, i found out that the drive letter is getting removed 
> while parsing the where string, but I could not find a way solve my problem: 
> I can write the files on my C:\ drive without problems, but I am not able to 
> write a parquet file on another drive than C:.
>  Am I doing something wrong or is this just how it works? I would really 
> appreciate any help, because I just cannot fit my files on C: drive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to