A failing remote api that you are calling and a failing filesystem you are using have different implications.
Kind Regards, Brandon On Tue, Mar 4, 2025 at 7:47 AM Štefan Miklošovič <smikloso...@apache.org> wrote: > > I don't say that using remote object storage is useless. > > I am just saying that I don't see the difference. I have not measured that > but I can imagine that s3 mounted would use, under the hood, the same calls > to s3 api. How else would it be done? You need to talk to remote s3 storage > eventually anyway. So why does it matter if we call s3 api from Java or by > other means from some "s3 driver"? It is eventually using same thing, no? > > On Tue, Mar 4, 2025 at 12:47 PM Jeff Jirsa <jji...@gmail.com> wrote: >> >> Mounting an s3 bucket as a directory is an easy but poor implementation of >> object backed storage for databases >> >> Object storage is durable (most data loss is due to bugs not concurrent >> hardware failures), cheap (can 5-10x cheaper) and ubiquitous. A huge number >> of modern systems are object-storage-only because the approximately infinite >> scale / cost / throughput tradeoffs often make up for the latency. >> >> Outright dismissing object storage for Cassandra is short sighted - it needs >> to be done in a way that makes sense, not just blindly copying over the >> block access patterns to object. >> >> >> On Mar 4, 2025, at 11:19 AM, Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >> >> I do not think we need this CEP, honestly. I don't want to diss this >> unnecessarily but if you mount a remote storage locally (e.g. mounting s3 >> bucket as if it was any other directory on node's machine), then what is >> this CEP good for? >> >> Not talking about the necessity to put all dependencies to be able to talk >> to respective remote storage to Cassandra's class path, introducing >> potential problems with dependencies and their possible incompatibilities / >> different versions etc ... >> >> On Thu, Feb 27, 2025 at 6:21 AM C. Scott Andreas <sc...@paradoxica.net> >> wrote: >>> >>> I’d love to see this implemented — where “this” is a proxy for some notion >>> of support for remote object storage, perhaps usable by compaction >>> strategies like TWCS to migrate data older than a threshold from a local >>> filesystem to remote object. >>> >>> It’s not an area where I can currently dedicate engineering effort. But if >>> others are interested in contributing a feature like this, I’d see it as >>> valuable for the project and would be happy to collaborate on >>> design/architecture/goals. >>> >>> – Scott >>> >>> On Feb 26, 2025, at 6:56 AM, guo Maxwell <cclive1...@gmail.com> wrote: >>> >>> >>> Is anyone else interested in continuing to discuss this topic? >>> >>> guo Maxwell <cclive1...@gmail.com> 于2024年9月20日周五 09:44写道: >>>> >>>> I discussed this offline with Claude, he is no longer working on this. >>>> >>>> It's a pity. I think this is a very valuable thing. Commitlog's archiving >>>> and restore may be able to use the relevant code if it is completed. >>>> >>>> Patrick McFadin <pmcfa...@gmail.com>于2024年9月20日 周五上午2:01写道: >>>>> >>>>> Thanks for reviving this one! >>>>> >>>>> On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell <cclive1...@gmail.com> wrote: >>>>>> >>>>>> Is there any update on this topic? It seems that things can make a big >>>>>> progress if Jake Luciani can find someone who can make the >>>>>> FileSystemProvider code accessible. >>>>>> >>>>>> Jon Haddad <j...@jonhaddad.com> 于2023年12月16日周六 05:29写道: >>>>>>> >>>>>>> At a high level I really like the idea of being able to better leverage >>>>>>> cheaper storage especially object stores like S3. >>>>>>> >>>>>>> One important thing though - I feel pretty strongly that there's a big, >>>>>>> deal breaking downside. Backups, disk failure policies, snapshots and >>>>>>> possibly repairs would get more complicated which haven't been >>>>>>> particularly great in the past, and of course there's the issue of >>>>>>> failure recovery being only partially possible if you're looking at a >>>>>>> durable block store paired with an ephemeral one with some of your data >>>>>>> not replicated to the cold side. That introduces a failure case that's >>>>>>> unacceptable for most teams, which results in needing to implement >>>>>>> potentially 2 different backup solutions. This is operationally >>>>>>> complex with a lot of surface area for headaches. I think a lot of >>>>>>> teams would probably have an issue with the big question mark around >>>>>>> durability and I probably would avoid it myself. >>>>>>> >>>>>>> On the other hand, I'm +1 if we approach it something slightly >>>>>>> differently - where _all_ the data is located on the cold storage, with >>>>>>> the local hot storage used as a cache. This means we can use the cold >>>>>>> directories for the complete dataset, simplifying backups and node >>>>>>> replacements. >>>>>>> >>>>>>> For a little background, we had a ticket several years ago where I >>>>>>> pointed out it was possible to do this *today* at the operating system >>>>>>> level as long as you're using block devices (vs an object store) and >>>>>>> LVM [1]. For example, this works well with GP3 EBS w/ low IOPS >>>>>>> provisioning + local NVMe to get a nice balance of great read >>>>>>> performance without going nuts on the cost for IOPS. I also wrote >>>>>>> about this in a little more detail in my blog [2]. There's also the >>>>>>> new mount point tech in AWS which pretty much does exactly what I've >>>>>>> suggested above [3] that's probably worth evaluating just to get a feel >>>>>>> for it. >>>>>>> >>>>>>> I'm not insisting we require LVM or the AWS S3 fs, since that would >>>>>>> rule out other cloud providers, but I am pretty confident that the >>>>>>> entire dataset should reside in the "cold" side of things for the >>>>>>> practical and technical reasons I listed above. I don't think it >>>>>>> massively changes the proposal, and should simplify things for everyone. >>>>>>> >>>>>>> Jon >>>>>>> >>>>>>> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/ >>>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460 >>>>>>> [3] >>>>>>> https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/ >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 14, 2023 at 1:56 AM Claude Warren <cla...@apache.org> wrote: >>>>>>>> >>>>>>>> Is there still interest in this? Can we get some points down on >>>>>>>> electrons so that we all understand the issues? >>>>>>>> >>>>>>>> While it is fairly simple to redirect the read/write to something >>>>>>>> other than the local system for a single node this will not solve the >>>>>>>> problem for tiered storage. >>>>>>>> >>>>>>>> Tiered storage will require that on read/write the primary key be >>>>>>>> assessed and determine if the read/write should be redirected. My >>>>>>>> reasoning for this statement is that in a cluster with a replication >>>>>>>> factor greater than 1 the node will store data for the keys that would >>>>>>>> be allocated to it in a cluster with a replication factor = 1, as well >>>>>>>> as some keys from nodes earlier in the ring. >>>>>>>> >>>>>>>> Even if we can get the primary keys for all the data we want to write >>>>>>>> to "cold storage" to map to a single node a replication factor > 1 >>>>>>>> means that data will also be placed in "normal storage" on subsequent >>>>>>>> nodes. >>>>>>>> >>>>>>>> To overcome this, we have to explore ways to route data to different >>>>>>>> storage based on the keys and that different storage may have to be >>>>>>>> available on _all_ the nodes. >>>>>>>> >>>>>>>> Have any of the partial solutions mentioned in this email chain (or >>>>>>>> others) solved this problem? >>>>>>>> >>>>>>>> Claude