Tom-Newton opened a new issue, #49043:
URL: https://github.com/apache/arrow/issues/49043
### Describe the bug, including details regarding any error messages,
version, and platform.
We found an obscure edgecase in a production usecase where the Arrow
`AzureFileSystem::GetFileInfo` incorrectly returns `FileType::NotFound`.
I think this is going to be very hard to create a reproduce for that doesn't
need to read our Azure storage account, but I have idenitified the root cause.
`GetFileInfo` on flat namespace Azure blob storage accounts does the
following
```
Blobs::ListBlobsOptions options;
options.Prefix = internal::RemoveTrailingSlash(location.path);
options.PageSizeHint = 1;
auto list_response = container_client.ListBlobsByHierarchy("/", options);
if (!list_response.BlobPrefixes.empty()) {
...
}
if (!list_response.Blobs.empty()) {
...
}
info.set_type(FileType::NotFound);
return info;
```
`list_response` contains the results of the first page, but to see all the
results requires calling `MoveToNextPage`. In the above code we only check the
first page, because order is guaranteed and we only want the first result.
However, it seems that in some rare cases
`container_client.ListBlobsByHierarchy` can return an empty first page while
other pages are not empty. Therefore we incorrectly get `FileType::NotFound`.
I think I can probably make a PR to fix this, but I don't know how to write
a test for it. The issue only occurs when Azure does a somewhat strange thing,
that may well not exist in the azurite emulator.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]