Tom-Newton commented on code in PR #39298:
URL: https://github.com/apache/arrow/pull/39298#discussion_r1432644414
##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -798,7 +864,7 @@ class AzureFileSystem::Impl {
std::unique_ptr<DataLake::DataLakeServiceClient> datalake_service_client_;
std::unique_ptr<Blobs::BlobServiceClient> blob_service_client_;
- internal::HierarchicalNamespaceDetector hns_detector_;
+ HNSSupport cached_hns_support_ = HNSSupport::kUnknown;
Review Comment:
To be honest I'm not really convinced by either of these, but it probably
doesn't matter.
Technically you could avoid the call to check for hierarchical NS but I
don't think it really makes sense because of the very small performance cost
and the high added complexity cost. `DataLakeFileSystemClient` largely works on
flat NS accounts so it would require extra error handling on every call and as
we know from the hierarchical namespace detection code Azure often gives
strange responses and its difficult to determine if the failure is genuine or
if the error is just because hierarchical namespace is detected.
I don't really see why we would need to protects other member variables of
`AzureFileSystem::Impl` behind the same mutex. I would have kept the separate
detector class which could have just one mutex. When `AzureFileSystem::Impl`
needs to check whether hierarchical NS is enabled in parallel it would hit the
mutex but otherwise I would want `AzureFileSystem::Impl` to be free to make
other calls to blob storage without any mutexs getting in the way.
Additionally, I was under the impression that `AzureFileSystem` would never be
used in multiple threads. I think only `RandomAccessFile::ReadAt` is used
concurrently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]