wkarwacki opened a new issue, #42050:
URL: https://github.com/apache/arrow/issues/42050

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ```
   pyarrow==15.0.2
   ```
   
   Hey! I'm trying to use this function 
https://arrow.apache.org/docs/python/generated/pyarrow.fs.FileSystem.html#pyarrow.fs.FileSystem.from_uri
 to obtain `FileSystem` like below:
   ```python
   from pyarrow.fs import FileSystem
   
   FileSystem.from_uri("hdfs:///some/path")
   ```
   however, even though `core-site.xml` is properly configured I'm getting:
   
   ```
   URISyntaxException: Expected authority at index 7: 
hdfs://java.lang.IllegalArgumentException: Expected authority at index 7: 
hdfs://
   ```
   I might be mistaken, however it seems that 
[it](https://github.com/apache/arrow/blob/7179511a47d4e07322f12a010c5dace89751b411/cpp/src/arrow/filesystem/hdfs.cc#L367)
 tries to find hostname between second and third slash charater in such HDFS 
URL and just does not take into account core-site.xml config at all.
   
   I'm able to successfully create `FileSystem` when providing namenode 
explicitly with `hdfs://{namenode}/{path}`. Currently, we are working around 
this issue with two strategies of parsing:
   ```python
   from pyarrow.fs import FileSystem, HadoopFileSystem
   
   parsed = urlparse(url)
   if parsed.scheme == "hdfs":
       hadoop_file_system = HadoopFileSystem("default") # this properly 
recognizes core-site.xml
       return (hadoop_file_system, parsed.path)
   else:
       file_system: Tuple[FileSystem, str] = FileSystem.from_uri(uri=url)
       return file_system
   ```
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to