This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 71293212f2 GH-47560: [C++] Fix host handling for default HDFS URI
(#47458)
71293212f2 is described below
commit 71293212f2006d6b224fa7d0658c1f6d51689b83
Author: Diego Sevilla Ruiz <[email protected]>
AuthorDate: Tue Sep 23 18:52:30 2025 +0200
GH-47560: [C++] Fix host handling for default HDFS URI (#47458)
### Rationale for this change
In #25324 a fix is introduced for the python HadoopFileSystem, but it does
not work if you use `from_uri()`, as it is passed to the underlying C++
implementation of the options parsing. The "default" case is not handled as in
the python case, as the whole "hdfs://default" is passed to the underlying hdfs
library, that expect "default" to search in `$HADOOP_CONF_DIR/core-site.xml`.
### What changes are included in this PR?
Handle the `HadoopFileSystem.from_uri()` (or `FileSystem.from_uri()` when
using `hdfs://default:xxx`) special HDFS URIs.
### Are these changes tested?
There are no specific tests for this feature, but existing HDFS CI jobs
pass.
### Are there any user-facing changes?
Not exactly, but the documentation is honored for the `from_uri()` case.
* GitHub Issue: #47560
Lead-authored-by: Diego Sevilla Ruiz <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
---
cpp/src/arrow/filesystem/hdfs.cc | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/cpp/src/arrow/filesystem/hdfs.cc b/cpp/src/arrow/filesystem/hdfs.cc
index d59b2a342d..adb8b0d50d 100644
--- a/cpp/src/arrow/filesystem/hdfs.cc
+++ b/cpp/src/arrow/filesystem/hdfs.cc
@@ -363,8 +363,14 @@ Result<HdfsOptions> HdfsOptions::FromUri(const Uri& uri) {
options_map.emplace(kv.first, kv.second);
}
+ // Special case host = "default" or "hdfs://default" as stated by GH-47560.
+ // If given the string "default", libhdfs selects the default filesystem
+ // from `core-site.xml`.
std::string host;
- host = uri.scheme() + "://" + uri.host();
+ if (uri.host() == "default")
+ host = uri.host();
+ else
+ host = uri.scheme() + "://" + uri.host();
// configure endpoint
const auto port = uri.port();