wkarwacki opened a new issue, #42050: URL: https://github.com/apache/arrow/issues/42050
### Describe the bug, including details regarding any error messages, version, and platform. ``` pyarrow==15.0.2 ``` Hey! I'm trying to use this function https://arrow.apache.org/docs/python/generated/pyarrow.fs.FileSystem.html#pyarrow.fs.FileSystem.from_uri to obtain `FileSystem` like below: ```python from pyarrow.fs import FileSystem FileSystem.from_uri("hdfs:///some/path") ``` however, even though `core-site.xml` is properly configured I'm getting: ``` URISyntaxException: Expected authority at index 7: hdfs://java.lang.IllegalArgumentException: Expected authority at index 7: hdfs:// ``` I might be mistaken, however it seems that [it](https://github.com/apache/arrow/blob/7179511a47d4e07322f12a010c5dace89751b411/cpp/src/arrow/filesystem/hdfs.cc#L367) tries to find hostname between second and third slash charater in such HDFS URL and just does not take into account core-site.xml config at all. I'm able to successfully create `FileSystem` when providing namenode explicitly with `hdfs://{namenode}/{path}`. Currently, we are working around this issue with two strategies of parsing: ```python from pyarrow.fs import FileSystem, HadoopFileSystem parsed = urlparse(url) if parsed.scheme == "hdfs": hadoop_file_system = HadoopFileSystem("default") # this properly recognizes core-site.xml return (hadoop_file_system, parsed.path) else: file_system: Tuple[FileSystem, str] = FileSystem.from_uri(uri=url) return file_system ``` ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
