vanderliang created ARROW-4943:
----------------------------------

             Summary: pyarrow.lib.HadoopFileSystem._connect failed due to 
TypeError
                 Key: ARROW-4943
                 URL: https://issues.apache.org/jira/browse/ARROW-4943
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.1
         Environment: Kernel: 4.4.95.x86_64
Python: 2.7.5
            Reporter: vanderliang


 

When run [https://github.com/uber/petastorm.git] pytorch_hello_world.py script, 
it fails due to TypeError as following.

It seems that the pyarrow.lib.HadoopFileSystem._connect require unicode 
argument, however, the argument input is aways a string type. So add a 
unicode() convert to make sure that the argument is a unicode type.

Traceback (most recent call last):
 File "pytorch_hello_world.py", line 31, in <module>
 pytorch_hello_world()
 File "pytorch_hello_world.py", line 25, in pytorch_hello_world
 with DataLoader(make_reader(dataset_url)) as train_loader:
 File "/usr/lib/python2.7/site-packages/petastorm/reader.py", line 132, in 
make_reader
 resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver)
 File "/usr/lib/python2.7/site-packages/petastorm/fs_utils.py", line 83, in 
__init__
 self._filesystem = connector.connect_to_either_namenode(namenodes)
 File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 266, 
in connect_to_either_namenode
 return HAHdfsClient(cls, list_of_namenodes)
 File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 224, 
in __init__
 self._do_connect()
 File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 233, 
in _do_connect
 self._connector_cls._try_next_namenode(self._index_of_nn, 
self._list_of_namenodes)
 File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 289, 
in _try_next_namenode
 cls.hdfs_connect_namenode(urlparse('hdfs://' + str(host or 'default')))
 File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 250, 
in hdfs_connect_namenode
 return pyarrow.hdfs.connect(url.hostname or 'default', url.port or 8020, 
driver=driver)
 File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 209, in connect
 extra_conf=extra_conf)
 File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 39, in __init__
 self._connect(host, port, user, kerb_ticket, driver, extra_conf)
 File "pyarrow/io-hdfs.pxi", line 97, in pyarrow.lib.HadoopFileSystem._connect
TypeError: Expected unicode, got str



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to