pitrou commented on code in PR #45998:
URL: https://github.com/apache/arrow/pull/45998#discussion_r2065856352


##########
cpp/src/arrow/CMakeLists.txt:
##########
@@ -408,15 +408,12 @@ arrow_add_object_library(ARROW_IO
                          io/caching.cc
                          io/compressed.cc
                          io/file.cc
-                         io/hdfs.cc
-                         io/hdfs_internal.cc
                          io/interfaces.cc
                          io/memory.cc
                          io/slow.cc
                          io/stdio.cc
                          io/transform.cc)
 foreach(ARROW_IO_TARGET ${ARROW_IO_TARGETS})
-  target_link_libraries(${ARROW_IO_TARGET} PRIVATE arrow::hadoop)
   if(NOT MSVC)
     target_link_libraries(${ARROW_IO_TARGET} PRIVATE ${CMAKE_DL_LIBS})
   endif()

Review Comment:
   I think `CMAKE_DL_LIBS` was for HDFS, you need to move it as well?



##########
cpp/src/arrow/filesystem/CMakeLists.txt:
##########
@@ -143,4 +143,12 @@ endif()
 
 if(ARROW_HDFS)
   add_arrow_test(hdfs_test EXTRA_LABELS filesystem)
+  add_arrow_test(hdfs_test_io

Review Comment:
   Nit: should call this `hdfs_io_test`, or perhaps better `hdfs_internals_test`



##########
cpp/src/arrow/filesystem/hdfs_io.h:
##########
@@ -38,18 +40,6 @@ namespace io {
 class HdfsReadableFile;

Review Comment:
   Most of these declarations should IMHO go into the 
`arrow::filesystem::internal` namespace, except for `HdfsConnectionConfig` 
which can go into `arrow::filesystem`.



##########
cpp/src/arrow/filesystem/api.h:
##########
@@ -32,3 +32,6 @@
 #ifdef ARROW_S3
 #  include "arrow/filesystem/s3fs.h"  // IWYU pragma: export
 #endif
+#ifdef ARROW_HDFS
+#  include "arrow/filesystem/hdfs_io.h"  // IWYU pragma: export

Review Comment:
   I don't think we need to include this one here. (also, it's included 
implicitly by `hdfs.h`)



##########
cpp/src/arrow/filesystem/hdfs_io.h:
##########
@@ -38,18 +40,6 @@ namespace io {
 class HdfsReadableFile;
 class HdfsOutputStream;
 
-/// DEPRECATED.  Use the FileSystem API in arrow::fs instead.
-struct ObjectType {
-  enum type { FILE, DIRECTORY };
-};
-
-/// DEPRECATED.  Use the FileSystem API in arrow::fs instead.
-struct ARROW_EXPORT FileStatistics {
-  /// Size of file, -1 if finding length is unsupported
-  int64_t size;
-  ObjectType::type kind;
-};
-
 class ARROW_EXPORT FileSystem {

Review Comment:
   Ok, but we don't want to keep those two unofficial `FileSystem` and 
`HadoopFileSystem` classes which create confusion with the other (public) 
filesystem classes.
   
   Ideally, those two classes disappear and their implementation code gets 
folded into the public `HadoopFileSystem` class.
   
   If that's too annoying, we should at least merge those two classes and give 
them a less ambiguous name, for example `HdfsClient`.



##########
cpp/src/arrow/CMakeLists.txt:
##########
@@ -833,7 +830,11 @@ if(ARROW_FILESYSTEM)
                                 PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON)
   endif()
   if(ARROW_HDFS)
-    list(APPEND ARROW_FILESYSTEM_SRCS filesystem/hdfs.cc)
+    list(APPEND
+         ARROW_FILESYSTEM_SRCS
+         filesystem/hdfs.cc
+         filesystem/hdfs_io.cc
+         filesystem/hdfs_internal.cc)

Review Comment:
   Don't we need to link to `arrow::hadoop` as was done above? cc @kou for 
advice



##########
python/pyarrow/includes/libarrow_fs.pxd:
##########
@@ -300,6 +300,80 @@ cdef extern from "arrow/filesystem/api.h" namespace 
"arrow::fs" nogil:
         const CIOContext& io_context,
         int64_t chunk_size, c_bool use_threads)
 
+cdef extern from "arrow/filesystem/hdfs_io.h" namespace "arrow::io" nogil:

Review Comment:
   Are all these declarations actually needed by PyArrow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to