If that's not your intent, then you should be more careful with your replies. When you write something like this:
> While this might work, what I find tricky is that we are forcing this to users. Not everybody is interested in putting everything to a bucket and server traffic from that. They just don't want to do that. Because reasons. They are just happy with what they have etc, it works fine for years and so on. They just want to upload SSTables upon snapshotting and call it a day. > I don't think we should force our worldview on them if they are not interested in it. It comes off *extremely* negative. You use the word "force" here multiple times. On Fri, Mar 7, 2025 at 9:18 AM Štefan Miklošovič <smikloso...@apache.org> wrote: > I was explaining multiple times (1) that I don't have anything against > what is discussed here. > > Having questions about what that is going to look like does not mean I am > dismissive. > > (1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg > > On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad <j...@rustyrazorblade.com> wrote: > >> Nobody is saying you can't work with a mount, and this isn't a >> conversation about snapshots. >> >> Nobody is forcing users to use object storage either. >> >> You're making a ton of negative assumptions here about both the >> discussion, and the people you're having it with. Try to be more open >> minded. >> >> >> On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >>> The only way I see that working is that, if everything was in a bucket, >>> if you take a snapshot, these SSTables would be "copied" from live data dir >>> (living in a bucket) to snapshots dir (living in a bucket). Basically, we >>> would need to say "and if you go to take a snapshot on this table, instead >>> of hardlinking these SSTables, do a copy". But this "copying" would be >>> internal to a bucket itself. We would not need to "upload" from node's >>> machine to s3. >>> >>> While this might work, what I find tricky is that we are forcing this to >>> users. Not everybody is interested in putting everything to a bucket and >>> server traffic from that. They just don't want to do that. Because reasons. >>> They are just happy with what they have etc, it works fine for years and so >>> on. They just want to upload SSTables upon snapshotting and call it a day. >>> >>> I don't think we should force our worldview on them if they are not >>> interested in it. >>> >>> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič < >>> smikloso...@apache.org> wrote: >>> >>>> BTW, snapshots are quite special because these are not "files", they >>>> are just hard links. They "materialize" as regular files once underlying >>>> SSTables are compacted away. How are you going to hardlink from local >>>> storage to an object storage anyway? We will always need to "upload". >>>> >>>> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič < >>>> smikloso...@apache.org> wrote: >>>> >>>>> Jon, >>>>> >>>>> all "big three" support mounting a bucket locally. That being said, I >>>>> do not think that completely ditching this possibility for Cassandra >>>>> working with a mount, e.g. for just uploading snapshots there etc, is >>>>> reasonable. >>>>> >>>>> GCP >>>>> >>>>> >>>>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket >>>>> >>>>> Azure (this one is quite sophisticated), lot of options ... >>>>> >>>>> >>>>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL >>>>> >>>>> S3, lot of options how to mount that >>>>> >>>>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system >>>>> >>>>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com> >>>>> wrote: >>>>> >>>>>> Assuming everything else is identical, might not matter for S3. >>>>>> However, not every object store has a filesystem mount. >>>>>> >>>>>> Regarding sprawling dependencies, we can always make the provider >>>>>> specific libraries available as a separate download and put them on their >>>>>> own thread with a separate class path. I think in JVM dtest does this >>>>>> already. Someone just started asking about IAM for login, it sounds >>>>>> like a >>>>>> similar problem. >>>>>> >>>>>> >>>>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote: >>>>>> >>>>>>> I think another way of saying what Stefan may be getting at is what >>>>>>> does a library give us that an appropriately configured mount dir >>>>>>> doesn’t? >>>>>>> >>>>>>> We don’t want to treat S3 the same as local disk, but this can be >>>>>>> achieved easily with config. Is there some other benefit of direct >>>>>>> integration? Well defined exceptions if we need to distinguish cases is >>>>>>> one >>>>>>> that maybe springs to mind but perhaps there are others? >>>>>>> >>>>>>> >>>>>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> That is cool but this still does not show / explain how it would >>>>>>> look like when it comes to dependencies needed for actually talking to >>>>>>> storages like s3. >>>>>>> >>>>>>> Maybe I am missing something here and please explain when I am >>>>>>> mistaken but If I understand that correctly, for talking to s3 we would >>>>>>> need to use a library like this, right? (1). So that would be added >>>>>>> among >>>>>>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? >>>>>>> Why >>>>>>> s3? Every time somebody comes up with a new remote storage support, that >>>>>>> would be added to classpath as well? How are these dependencies going to >>>>>>> play with each other and with Cassandra in general? Will all these >>>>>>> storage >>>>>>> provider libraries for arbitrary clouds be even compatible with >>>>>>> Cassandra >>>>>>> licence-wise? >>>>>>> >>>>>>> I am sorry I keep repeating these questions but this part of that I >>>>>>> just don't get at all. >>>>>>> >>>>>>> We can indeed add an API for this, sure sure, why not. But for >>>>>>> people who do not want to deal with this at all and just be OK with a FS >>>>>>> mounted, why would we block them doing that? >>>>>>> >>>>>>> (1) >>>>>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml >>>>>>> >>>>>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> . >>>>>>>> >>>>>>>> >>>>>>>> It’s not an area where I can currently dedicate engineering effort. >>>>>>>>> But if others are interested in contributing a feature like this, I’d >>>>>>>>> see >>>>>>>>> it as valuable for the project and would be happy to collaborate on >>>>>>>>> design/architecture/goals. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Jake mentioned 17 months ago a custom FileSystemProvider we could >>>>>>>> offer. >>>>>>>> >>>>>>>> None of us at DataStax has gotten around to providing that, but to >>>>>>>> quickly throw something over the wall this is it: >>>>>>>> >>>>>>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java >>>>>>>> >>>>>>>> (with a few friend classes under o.a.c.io.util) >>>>>>>> >>>>>>>> We then have a RemoteStorageProvider, private in another repo, that >>>>>>>> implements that and also provides the RemoteFileSystemProvider that >>>>>>>> Jake >>>>>>>> refers to. >>>>>>>> >>>>>>>> Hopefully that's a start to get people thinking about CEP level >>>>>>>> details, while we get a cleaned abstract of RemoteStorageProvider and >>>>>>>> friends to offer. >>>>>>>> >>>>>>>>