Jon, all "big three" support mounting a bucket locally. That being said, I do not think that completely ditching this possibility for Cassandra working with a mount, e.g. for just uploading snapshots there etc, is reasonable.
GCP https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket Azure (this one is quite sophisticated), lot of options ... https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL S3, lot of options how to mount that https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com> wrote: > Assuming everything else is identical, might not matter for S3. However, > not every object store has a filesystem mount. > > Regarding sprawling dependencies, we can always make the provider specific > libraries available as a separate download and put them on their own thread > with a separate class path. I think in JVM dtest does this already. > Someone just started asking about IAM for login, it sounds like a similar > problem. > > > On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote: > >> I think another way of saying what Stefan may be getting at is what does >> a library give us that an appropriately configured mount dir doesn’t? >> >> We don’t want to treat S3 the same as local disk, but this can be >> achieved easily with config. Is there some other benefit of direct >> integration? Well defined exceptions if we need to distinguish cases is one >> that maybe springs to mind but perhaps there are others? >> >> >> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >> >> >> That is cool but this still does not show / explain how it would look >> like when it comes to dependencies needed for actually talking to storages >> like s3. >> >> Maybe I am missing something here and please explain when I am mistaken >> but If I understand that correctly, for talking to s3 we would need to use >> a library like this, right? (1). So that would be added among Cassandra >> dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every >> time somebody comes up with a new remote storage support, that would be >> added to classpath as well? How are these dependencies going to play with >> each other and with Cassandra in general? Will all these storage >> provider libraries for arbitrary clouds be even compatible with Cassandra >> licence-wise? >> >> I am sorry I keep repeating these questions but this part of that I just >> don't get at all. >> >> We can indeed add an API for this, sure sure, why not. But for people who >> do not want to deal with this at all and just be OK with a FS mounted, why >> would we block them doing that? >> >> (1) >> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml >> >> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> wrote: >> >>> . >>> >>> >>> It’s not an area where I can currently dedicate engineering effort. But >>>> if others are interested in contributing a feature like this, I’d see it as >>>> valuable for the project and would be happy to collaborate on >>>> design/architecture/goals. >>>> >>> >>> >>> Jake mentioned 17 months ago a custom FileSystemProvider we could offer. >>> >>> None of us at DataStax has gotten around to providing that, but to >>> quickly throw something over the wall this is it: >>> >>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java >>> >>> (with a few friend classes under o.a.c.io.util) >>> >>> We then have a RemoteStorageProvider, private in another repo, that >>> implements that and also provides the RemoteFileSystemProvider that Jake >>> refers to. >>> >>> Hopefully that's a start to get people thinking about CEP level details, >>> while we get a cleaned abstract of RemoteStorageProvider and friends to >>> offer. >>> >>>