Nobody is saying you can't work with a mount, and this isn't a conversation about snapshots.
Nobody is forcing users to use object storage either. You're making a ton of negative assumptions here about both the discussion, and the people you're having it with. Try to be more open minded. On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič <smikloso...@apache.org> wrote: > The only way I see that working is that, if everything was in a bucket, if > you take a snapshot, these SSTables would be "copied" from live data dir > (living in a bucket) to snapshots dir (living in a bucket). Basically, we > would need to say "and if you go to take a snapshot on this table, instead > of hardlinking these SSTables, do a copy". But this "copying" would be > internal to a bucket itself. We would not need to "upload" from node's > machine to s3. > > While this might work, what I find tricky is that we are forcing this to > users. Not everybody is interested in putting everything to a bucket and > server traffic from that. They just don't want to do that. Because reasons. > They are just happy with what they have etc, it works fine for years and so > on. They just want to upload SSTables upon snapshotting and call it a day. > > I don't think we should force our worldview on them if they are not > interested in it. > > On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> BTW, snapshots are quite special because these are not "files", they are >> just hard links. They "materialize" as regular files once underlying >> SSTables are compacted away. How are you going to hardlink from local >> storage to an object storage anyway? We will always need to "upload". >> >> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >>> Jon, >>> >>> all "big three" support mounting a bucket locally. That being said, I do >>> not think that completely ditching this possibility for Cassandra working >>> with a mount, e.g. for just uploading snapshots there etc, is reasonable. >>> >>> GCP >>> >>> >>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket >>> >>> Azure (this one is quite sophisticated), lot of options ... >>> >>> >>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL >>> >>> S3, lot of options how to mount that >>> >>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system >>> >>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com> >>> wrote: >>> >>>> Assuming everything else is identical, might not matter for S3. >>>> However, not every object store has a filesystem mount. >>>> >>>> Regarding sprawling dependencies, we can always make the provider >>>> specific libraries available as a separate download and put them on their >>>> own thread with a separate class path. I think in JVM dtest does this >>>> already. Someone just started asking about IAM for login, it sounds like a >>>> similar problem. >>>> >>>> >>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote: >>>> >>>>> I think another way of saying what Stefan may be getting at is what >>>>> does a library give us that an appropriately configured mount dir doesn’t? >>>>> >>>>> We don’t want to treat S3 the same as local disk, but this can be >>>>> achieved easily with config. Is there some other benefit of direct >>>>> integration? Well defined exceptions if we need to distinguish cases is >>>>> one >>>>> that maybe springs to mind but perhaps there are others? >>>>> >>>>> >>>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> That is cool but this still does not show / explain how it would look >>>>> like when it comes to dependencies needed for actually talking to storages >>>>> like s3. >>>>> >>>>> Maybe I am missing something here and please explain when I am >>>>> mistaken but If I understand that correctly, for talking to s3 we would >>>>> need to use a library like this, right? (1). So that would be added among >>>>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? >>>>> Why >>>>> s3? Every time somebody comes up with a new remote storage support, that >>>>> would be added to classpath as well? How are these dependencies going to >>>>> play with each other and with Cassandra in general? Will all these storage >>>>> provider libraries for arbitrary clouds be even compatible with Cassandra >>>>> licence-wise? >>>>> >>>>> I am sorry I keep repeating these questions but this part of that I >>>>> just don't get at all. >>>>> >>>>> We can indeed add an API for this, sure sure, why not. But for people >>>>> who do not want to deal with this at all and just be OK with a FS mounted, >>>>> why would we block them doing that? >>>>> >>>>> (1) >>>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml >>>>> >>>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> wrote: >>>>> >>>>>> . >>>>>> >>>>>> >>>>>> It’s not an area where I can currently dedicate engineering effort. >>>>>>> But if others are interested in contributing a feature like this, I’d >>>>>>> see >>>>>>> it as valuable for the project and would be happy to collaborate on >>>>>>> design/architecture/goals. >>>>>>> >>>>>> >>>>>> >>>>>> Jake mentioned 17 months ago a custom FileSystemProvider we could >>>>>> offer. >>>>>> >>>>>> None of us at DataStax has gotten around to providing that, but to >>>>>> quickly throw something over the wall this is it: >>>>>> >>>>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java >>>>>> >>>>>> (with a few friend classes under o.a.c.io.util) >>>>>> >>>>>> We then have a RemoteStorageProvider, private in another repo, that >>>>>> implements that and also provides the RemoteFileSystemProvider that Jake >>>>>> refers to. >>>>>> >>>>>> Hopefully that's a start to get people thinking about CEP level >>>>>> details, while we get a cleaned abstract of RemoteStorageProvider and >>>>>> friends to offer. >>>>>> >>>>>>