pitrou commented on a change in pull request #9995:
URL: https://github.com/apache/arrow/pull/9995#discussion_r611833986
##########
File path: cpp/src/arrow/filesystem/s3fs.cc
##########
@@ -1762,6 +1851,50 @@ Result<std::vector<FileInfo>>
S3FileSystem::GetFileInfo(const FileSelector& sele
return results;
}
+FileInfoGenerator S3FileSystem::GetFileInfoGenerator(const FileSelector&
select) {
+ auto maybe_base_path = S3Path::FromString(select.base_dir);
+ if (!maybe_base_path.ok()) {
+ return MakeFailingGenerator<FileInfoVector>(maybe_base_path.status());
+ }
+ auto base_path = *std::move(maybe_base_path);
+
+ if (base_path.empty()) {
+ // List all buckets, then possibly recurse
+ PushGenerator<AsyncGenerator<FileInfoVector>> gen;
+ auto producer = gen.producer();
+
+ auto fut = impl_->ListBucketsAsync(io_context());
+ auto impl = impl_->shared_from_this();
+ fut.AddCallback(
+ [producer, select, impl](const Result<std::vector<std::string>>& res)
mutable {
+ if (!res.ok()) {
+ producer.Push(res.status());
+ producer.Close();
+ return;
+ }
+ FileInfoVector buckets;
+ for (const auto& bucket : *res) {
+ buckets.push_back(FileInfo{bucket, FileType::Directory});
+ }
+ // Generate all bucket infos
+
producer.Push(MakeVectorGenerator(std::vector<FileInfoVector>{buckets}));
+ if (select.recursive) {
+ // Generate recursive walk for each bucket in turn
+ for (const auto& bucket : buckets) {
+ producer.Push(impl->WalkAsync(select, bucket.path(), ""));
+ }
+ }
+ producer.Close();
+ });
+
+ return MakeConcatenatedGenerator(
Review comment:
The first thing is that I don't understand the merged documentation
(what does `max_subscriptions` mean? how do I choose its value?).
The second thing is that `WalkAsync` is called for each child directory
above and schedules a walk, so I don't understand why this would run one bucket
at a time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]