I was explaining multiple times (1) that I don't have anything against what is discussed here.
Having questions about what that is going to look like does not mean I am dismissive. (1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad <j...@rustyrazorblade.com> wrote: > Nobody is saying you can't work with a mount, and this isn't a > conversation about snapshots. > > Nobody is forcing users to use object storage either. > > You're making a ton of negative assumptions here about both the > discussion, and the people you're having it with. Try to be more open > minded. > > > On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> The only way I see that working is that, if everything was in a bucket, >> if you take a snapshot, these SSTables would be "copied" from live data dir >> (living in a bucket) to snapshots dir (living in a bucket). Basically, we >> would need to say "and if you go to take a snapshot on this table, instead >> of hardlinking these SSTables, do a copy". But this "copying" would be >> internal to a bucket itself. We would not need to "upload" from node's >> machine to s3. >> >> While this might work, what I find tricky is that we are forcing this to >> users. Not everybody is interested in putting everything to a bucket and >> server traffic from that. They just don't want to do that. Because reasons. >> They are just happy with what they have etc, it works fine for years and so >> on. They just want to upload SSTables upon snapshotting and call it a day. >> >> I don't think we should force our worldview on them if they are not >> interested in it. >> >> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >>> BTW, snapshots are quite special because these are not "files", they are >>> just hard links. They "materialize" as regular files once underlying >>> SSTables are compacted away. How are you going to hardlink from local >>> storage to an object storage anyway? We will always need to "upload". >>> >>> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič < >>> smikloso...@apache.org> wrote: >>> >>>> Jon, >>>> >>>> all "big three" support mounting a bucket locally. That being said, I >>>> do not think that completely ditching this possibility for Cassandra >>>> working with a mount, e.g. for just uploading snapshots there etc, is >>>> reasonable. >>>> >>>> GCP >>>> >>>> >>>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket >>>> >>>> Azure (this one is quite sophisticated), lot of options ... >>>> >>>> >>>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL >>>> >>>> S3, lot of options how to mount that >>>> >>>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system >>>> >>>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com> >>>> wrote: >>>> >>>>> Assuming everything else is identical, might not matter for S3. >>>>> However, not every object store has a filesystem mount. >>>>> >>>>> Regarding sprawling dependencies, we can always make the provider >>>>> specific libraries available as a separate download and put them on their >>>>> own thread with a separate class path. I think in JVM dtest does this >>>>> already. Someone just started asking about IAM for login, it sounds like >>>>> a >>>>> similar problem. >>>>> >>>>> >>>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote: >>>>> >>>>>> I think another way of saying what Stefan may be getting at is what >>>>>> does a library give us that an appropriately configured mount dir >>>>>> doesn’t? >>>>>> >>>>>> We don’t want to treat S3 the same as local disk, but this can be >>>>>> achieved easily with config. Is there some other benefit of direct >>>>>> integration? Well defined exceptions if we need to distinguish cases is >>>>>> one >>>>>> that maybe springs to mind but perhaps there are others? >>>>>> >>>>>> >>>>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> That is cool but this still does not show / explain how it would look >>>>>> like when it comes to dependencies needed for actually talking to >>>>>> storages >>>>>> like s3. >>>>>> >>>>>> Maybe I am missing something here and please explain when I am >>>>>> mistaken but If I understand that correctly, for talking to s3 we would >>>>>> need to use a library like this, right? (1). So that would be added among >>>>>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? >>>>>> Why >>>>>> s3? Every time somebody comes up with a new remote storage support, that >>>>>> would be added to classpath as well? How are these dependencies going to >>>>>> play with each other and with Cassandra in general? Will all these >>>>>> storage >>>>>> provider libraries for arbitrary clouds be even compatible with Cassandra >>>>>> licence-wise? >>>>>> >>>>>> I am sorry I keep repeating these questions but this part of that I >>>>>> just don't get at all. >>>>>> >>>>>> We can indeed add an API for this, sure sure, why not. But for people >>>>>> who do not want to deal with this at all and just be OK with a FS >>>>>> mounted, >>>>>> why would we block them doing that? >>>>>> >>>>>> (1) >>>>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml >>>>>> >>>>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> . >>>>>>> >>>>>>> >>>>>>> It’s not an area where I can currently dedicate engineering effort. >>>>>>>> But if others are interested in contributing a feature like this, I’d >>>>>>>> see >>>>>>>> it as valuable for the project and would be happy to collaborate on >>>>>>>> design/architecture/goals. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Jake mentioned 17 months ago a custom FileSystemProvider we could >>>>>>> offer. >>>>>>> >>>>>>> None of us at DataStax has gotten around to providing that, but to >>>>>>> quickly throw something over the wall this is it: >>>>>>> >>>>>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java >>>>>>> >>>>>>> (with a few friend classes under o.a.c.io.util) >>>>>>> >>>>>>> We then have a RemoteStorageProvider, private in another repo, that >>>>>>> implements that and also provides the RemoteFileSystemProvider that Jake >>>>>>> refers to. >>>>>>> >>>>>>> Hopefully that's a start to get people thinking about CEP level >>>>>>> details, while we get a cleaned abstract of RemoteStorageProvider and >>>>>>> friends to offer. >>>>>>> >>>>>>>