Will Jones created ARROW-17069:
----------------------------------
Summary: [Python][R] GCSFIleSystem reports cannot resolve host on
public buckets
Key: ARROW-17069
URL: https://issues.apache.org/jira/browse/ARROW-17069
Project: Apache Arrow
Issue Type: Bug
Components: Python, R
Affects Versions: 8.0.0
Reporter: Will Jones
Assignee: Will Jones
Fix For: 9.0.0
GCSFileSystem will return {{Couldn't resolve host name}} if you don't supply
{{anonymous}} as the user:
{code:python}
import pyarrow.dataset as ds
# Fails:
dataset =
ds.dataset("gs://anonymous@voltrondata-labs-datasets/taxi-data/?retry_limit_seconds=3")
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py",
line 749, in dataset
# return _filesystem_dataset(source, **kwargs)
# File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py",
line 441, in _filesystem_dataset
# fs, paths_or_selector = _ensure_single_source(source, filesystem)
# File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py",
line 417, in _ensure_single_source
# raise FileNotFoundError(path)
# FileNotFoundError: voltrondata-labs-datasets/taxi-data
# This works fine:
>>> dataset =
>>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
{code}
I would expect that we could connect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)