pitrou commented on code in PR #45998: URL: https://github.com/apache/arrow/pull/45998#discussion_r2065856352
########## cpp/src/arrow/CMakeLists.txt: ########## @@ -408,15 +408,12 @@ arrow_add_object_library(ARROW_IO io/caching.cc io/compressed.cc io/file.cc - io/hdfs.cc - io/hdfs_internal.cc io/interfaces.cc io/memory.cc io/slow.cc io/stdio.cc io/transform.cc) foreach(ARROW_IO_TARGET ${ARROW_IO_TARGETS}) - target_link_libraries(${ARROW_IO_TARGET} PRIVATE arrow::hadoop) if(NOT MSVC) target_link_libraries(${ARROW_IO_TARGET} PRIVATE ${CMAKE_DL_LIBS}) endif() Review Comment: I think `CMAKE_DL_LIBS` was for HDFS, you need to move it as well? ########## cpp/src/arrow/filesystem/CMakeLists.txt: ########## @@ -143,4 +143,12 @@ endif() if(ARROW_HDFS) add_arrow_test(hdfs_test EXTRA_LABELS filesystem) + add_arrow_test(hdfs_test_io Review Comment: Nit: should call this `hdfs_io_test`, or perhaps better `hdfs_internals_test` ########## cpp/src/arrow/filesystem/hdfs_io.h: ########## @@ -38,18 +40,6 @@ namespace io { class HdfsReadableFile; Review Comment: Most of these declarations should IMHO go into the `arrow::filesystem::internal` namespace, except for `HdfsConnectionConfig` which can go into `arrow::filesystem`. ########## cpp/src/arrow/filesystem/api.h: ########## @@ -32,3 +32,6 @@ #ifdef ARROW_S3 # include "arrow/filesystem/s3fs.h" // IWYU pragma: export #endif +#ifdef ARROW_HDFS +# include "arrow/filesystem/hdfs_io.h" // IWYU pragma: export Review Comment: I don't think we need to include this one here. (also, it's included implicitly by `hdfs.h`) ########## cpp/src/arrow/filesystem/hdfs_io.h: ########## @@ -38,18 +40,6 @@ namespace io { class HdfsReadableFile; class HdfsOutputStream; -/// DEPRECATED. Use the FileSystem API in arrow::fs instead. -struct ObjectType { - enum type { FILE, DIRECTORY }; -}; - -/// DEPRECATED. Use the FileSystem API in arrow::fs instead. -struct ARROW_EXPORT FileStatistics { - /// Size of file, -1 if finding length is unsupported - int64_t size; - ObjectType::type kind; -}; - class ARROW_EXPORT FileSystem { Review Comment: Ok, but we don't want to keep those two unofficial `FileSystem` and `HadoopFileSystem` classes which create confusion with the other (public) filesystem classes. Ideally, those two classes disappear and their implementation code gets folded into the public `HadoopFileSystem` class. If that's too annoying, we should at least merge those two classes and give them a less ambiguous name, for example `HdfsClient`. ########## cpp/src/arrow/CMakeLists.txt: ########## @@ -833,7 +830,11 @@ if(ARROW_FILESYSTEM) PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON) endif() if(ARROW_HDFS) - list(APPEND ARROW_FILESYSTEM_SRCS filesystem/hdfs.cc) + list(APPEND + ARROW_FILESYSTEM_SRCS + filesystem/hdfs.cc + filesystem/hdfs_io.cc + filesystem/hdfs_internal.cc) Review Comment: Don't we need to link to `arrow::hadoop` as was done above? cc @kou for advice ########## python/pyarrow/includes/libarrow_fs.pxd: ########## @@ -300,6 +300,80 @@ cdef extern from "arrow/filesystem/api.h" namespace "arrow::fs" nogil: const CIOContext& io_context, int64_t chunk_size, c_bool use_threads) +cdef extern from "arrow/filesystem/hdfs_io.h" namespace "arrow::io" nogil: Review Comment: Are all these declarations actually needed by PyArrow? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org