[
https://issues.apache.org/jira/browse/ARROW-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762321#comment-16762321
]
Michael Peleshenko commented on ARROW-4480:
-------------------------------------------
We ran into this issue as well, and a coworker dug into it and narrowed it down
to {{pyarrow.filesystem.resolve_filesystem_and_path()}}. The very last return
always returns {{parsed_uri.path}}, regardless of the schema. It seems for the
non-hdfs schema, just {{path}} should be returned, as otherwise the drive
letter seems to be stripped by {{urlparse()}}. A workaround for now is to call
{{pyarrow.parquet.write_table()}} with
{{filesystem=LocalFileSystem.get_instance()}}
{code}
def resolve_filesystem_and_path(where, filesystem=None):
"""
return filesystem from path which could be an HDFS URI
"""
if not _is_path_like(where):
if filesystem is not None:
raise ValueError("filesystem passed but where is file-like, so"
" there is nothing to open with filesystem.")
return filesystem, where
# input can be hdfs URI such as hdfs://host:port/myfile.parquet
path = _stringify_path(where)
if filesystem is not None:
return _ensure_filesystem(filesystem), path
parsed_uri = urlparse(path)
if parsed_uri.scheme == 'hdfs':
netloc_split = parsed_uri.netloc.split(':')
host = netloc_split[0]
if host == '':
host = 'default'
port = 0
if len(netloc_split) == 2 and netloc_split[1].isnumeric():
port = int(netloc_split[1])
fs = pa.hdfs.connect(host=host, port=port)
else:
fs = LocalFileSystem.get_instance()
return fs, parsed_uri.path
{code}
> [Python] Drive letter removed when writing parquet file
> --------------------------------------------------------
>
> Key: ARROW-4480
> URL: https://issues.apache.org/jira/browse/ARROW-4480
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.12.0
> Reporter: Seb Fru
> Priority: Major
> Labels: parquet
> Fix For: 0.13.0
>
>
> Hi everyone,
>
> importing this from Github:
>
> I encountered a problem while working with pyarrow: I am working on Windows
> 10. When I want to save a table using pq.write_table(tab,
> r'E:\parquetfiles\file1.parquet'), I get the Error "No such file or
> directory".
> After searching a bit, i found out that the drive letter is getting removed
> while parsing the where string, but I could not find a way solve my problem:
> I can write the files on my C:\ drive without problems, but I am not able to
> write a parquet file on another drive than C:.
> Am I doing something wrong or is this just how it works? I would really
> appreciate any help, because I just cannot fit my files on C: drive.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)