anjakefala commented on issue #33618: URL: https://github.com/apache/arrow/issues/33618#issuecomment-1423158795
I wrote a first draft (that does not compile yet), and ran into a few complications: * `readdir` seems to expect a directory stream pointer, and then it will return the next directory entry in that stream: https://man7.org/linux/man-pages/man3/readdir.3.html An example of a system call that returns a directory stream: https://man7.org/linux/man-pages/man3/opendir.3.html. From these it seems that it is not a 1-on-1 replacement with `stat`, which accepts a path to a specific entry. My understanding is we cannot give `readdir` a path to an entry, and ask it if the entry is a file or directory. The way it seems to usually be used is by looping through the entries in a directory, and checking what they are e.g. https://stackoverflow.com/questions/39429803/how-to-list-first-level-directories-only-in-c/39430337#39430337. There is probably a way to use `readdir` to get the information we want, but before I invest more time into that, I wanted to suggest an alternative approach. It might be simpler to instead use `std::filesystem::is_directory`: https://en.cppreference.com/w/cpp/filesystem/is_directory and `std::filesystem::is_regular_file` which are supported in C++17. My understanding is that Apache Arrow is on C++17 compiler, and that these functions are faster than `stat` (though, of course, I will verify this). I am going to explore the `std::filesystem` approach, but am welcoming feedback or insight. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
