anjakefala commented on issue #33618:
URL: https://github.com/apache/arrow/issues/33618#issuecomment-1423158795

   I wrote a first draft (that does not compile yet), and ran into a few 
complications:
   
   * `readdir` seems to expect a directory stream pointer, and then it will 
return the next directory entry in that stream: 
https://man7.org/linux/man-pages/man3/readdir.3.html An example of a system 
call that returns a directory stream: 
https://man7.org/linux/man-pages/man3/opendir.3.html. From these it seems that 
it is not a 1-on-1 replacement with `stat`, which accepts a path to a specific 
entry. My understanding is we cannot give `readdir` a path to an entry, and ask 
it if the entry is a file or directory. The way it seems to usually be used is 
by looping through the entries in a directory, and checking what they are e.g. 
https://stackoverflow.com/questions/39429803/how-to-list-first-level-directories-only-in-c/39430337#39430337.
   
   There is probably a way to use `readdir` to get the information we want, but 
before I invest more time into that, I wanted to suggest an alternative 
approach.
   
   It might be simpler to instead use `std::filesystem::is_directory`: 
https://en.cppreference.com/w/cpp/filesystem/is_directory and 
`std::filesystem::is_regular_file` which are supported in C++17. My 
understanding is that Apache Arrow is on C++17 compiler, and that these 
functions are faster than `stat` (though, of course, I will verify this).
   
   I am going to explore the `std::filesystem` approach, but am welcoming 
feedback or insight. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to