ArianaVillegas commented on a change in pull request #12625:
URL: https://github.com/apache/arrow/pull/12625#discussion_r841070369
##########
File path: cpp/src/arrow/dataset/discovery.cc
##########
@@ -134,8 +135,26 @@ Result<std::shared_ptr<DatasetFactory>>
FileSystemDatasetFactory::Make(
Result<std::shared_ptr<DatasetFactory>> FileSystemDatasetFactory::Make(
std::shared_ptr<fs::FileSystem> filesystem, const
std::vector<fs::FileInfo>& files,
std::shared_ptr<FileFormat> format, FileSystemFactoryOptions options) {
+ // Discover files in directories and globs
+ std::vector<fs::FileInfo> discovered_files;
+ for (const auto& file : files) {
+ if (file.IsDirectory()) {
+ fs::FileSelector file_selector;
+ file_selector.base_dir = file.dir_name();
+ file_selector.recursive = true;
+ ARROW_ASSIGN_OR_RAISE(auto folder_files,
filesystem->GetFileInfo(file_selector));
+ std::move(folder_files.begin(), folder_files.end(),
+ std::back_inserter(discovered_files));
+ } else if (file.IsGlob()) {
+ ARROW_ASSIGN_OR_RAISE(auto files, filesystem->GetFileInfoGlob(file));
+ std::move(files.begin(), files.end(),
std::back_inserter(discovered_files));
+ } else if (file.IsFile()) {
+ discovered_files.emplace_back(file);
+ }
+ }
+
Review comment:
Yes, I'm thinking of using an enum, something like this:
```
struct ARROW_EXPORT FileSelector {
std::string base_dir;
FileType type;
bool allow_not_found;
int32_t max_recursion;
FileSelector() : type(Unknown), allow_not_found(false), recursive(false),
max_recursion(INT32_MAX) {}
FileSelector(std::string base_dir) : base_dir(base_dir), type(File),
allow_not_found(false),
recursive(false),
max_recursion(INT32_MAX) {}
};
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]