Copilot commented on code in PR #45998:
URL: https://github.com/apache/arrow/pull/45998#discussion_r3505060158
##########
python/pyarrow/_hdfs.pyx:
##########
@@ -23,9 +23,22 @@ from pyarrow.includes.libarrow_fs cimport *
from pyarrow._fs cimport FileSystem
from pyarrow.lib import frombytes, tobytes
+from pyarrow.lib cimport check_status
from pyarrow.util import _stringify_path
+def _have_libhdfs():
+ """
+ Return true if HDFS (HadoopFileSystem) library is set up correctly.
+ """
+ try:
+ with nogil:
+ check_status(HaveLibHdfs())
+ return True
+ except Exception:
+ return False
Review Comment:
`HdfsOptions` in C++ no longer has a `connection_config` field (it now
stores host/port/user/kerb_ticket/extra_conf as private members with
accessors), but the Python bindings still model and use
`opts.connection_config.*` (e.g. in `HadoopFileSystem.__reduce__`). This will
fail to compile (and/or will be incorrect ABI-wise) once the C++ refactor
lands. Update the bindings to use the new `HdfsOptions` accessors (`host()`,
`port()`, `user()`, `kerb_ticket()`, `extra_conf()`) and adjust the `.pxd`
accordingly (remove `HdfsConnectionConfig` / `connection_config`).
##########
python/pyarrow/includes/libarrow_fs.pxd:
##########
@@ -311,6 +311,14 @@ cdef extern from "arrow/filesystem/api.h" namespace
"arrow::fs" nogil:
const CIOContext& io_context,
int64_t chunk_size, c_bool use_threads)
+ CStatus HaveLibHdfs()
+
+ cdef cppclass HdfsConnectionConfig:
+ c_string host
+ int port
+ c_string user
+ c_string kerb_ticket
+ unordered_map[c_string, c_string] extra_conf
Review Comment:
The Cython declarations for `arrow::fs::HdfsOptions` still expose a
`connection_config` member and add a new `HdfsConnectionConfig` type, but
`arrow::fs::HdfsOptions` no longer has that member/type (it now uses private
fields with accessors). This mismatch breaks the generated C++ when code
accesses `opts.connection_config` (see `_hdfs.pyx.__reduce__`) and should be
updated to match the new C++ header (`cpp/src/arrow/filesystem/hdfs.h`).
##########
python/pyarrow/io.pxi:
##########
@@ -46,18 +46,6 @@ cdef extern from "Python.h":
char *v, Py_ssize_t len) except NULL
Review Comment:
`pyarrow.io.have_libhdfs()` was removed, but `pyarrow.have_libhdfs()` is
still provided. If `pyarrow.io.have_libhdfs` is part of the public Python API,
this is a backwards-incompatible removal; consider keeping a thin wrapper in
`pyarrow.io` that delegates to `pyarrow.have_libhdfs()` (or to
`_hdfs._have_libhdfs`) so existing user code keeps working.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]