vanderliang created ARROW-4943:
----------------------------------
Summary: pyarrow.lib.HadoopFileSystem._connect failed due to
TypeError
Key: ARROW-4943
URL: https://issues.apache.org/jira/browse/ARROW-4943
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.12.1
Environment: Kernel: 4.4.95.x86_64
Python: 2.7.5
Reporter: vanderliang
When run [https://github.com/uber/petastorm.git] pytorch_hello_world.py script,
it fails due to TypeError as following.
It seems that the pyarrow.lib.HadoopFileSystem._connect require unicode
argument, however, the argument input is aways a string type. So add a
unicode() convert to make sure that the argument is a unicode type.
Traceback (most recent call last):
File "pytorch_hello_world.py", line 31, in <module>
pytorch_hello_world()
File "pytorch_hello_world.py", line 25, in pytorch_hello_world
with DataLoader(make_reader(dataset_url)) as train_loader:
File "/usr/lib/python2.7/site-packages/petastorm/reader.py", line 132, in
make_reader
resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver)
File "/usr/lib/python2.7/site-packages/petastorm/fs_utils.py", line 83, in
__init__
self._filesystem = connector.connect_to_either_namenode(namenodes)
File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 266,
in connect_to_either_namenode
return HAHdfsClient(cls, list_of_namenodes)
File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 224,
in __init__
self._do_connect()
File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 233,
in _do_connect
self._connector_cls._try_next_namenode(self._index_of_nn,
self._list_of_namenodes)
File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 289,
in _try_next_namenode
cls.hdfs_connect_namenode(urlparse('hdfs://' + str(host or 'default')))
File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 250,
in hdfs_connect_namenode
return pyarrow.hdfs.connect(url.hostname or 'default', url.port or 8020,
driver=driver)
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 209, in connect
extra_conf=extra_conf)
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 39, in __init__
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow/io-hdfs.pxi", line 97, in pyarrow.lib.HadoopFileSystem._connect
TypeError: Expected unicode, got str
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)