If we want to do this, we should wrap the object storage downwards and
provide the file system api capabilities upwards (Cassandra layer),if my
understanding is correct.


Brandon Williams <dri...@gmail.com>于2025年3月4日 周二下午9:55写道:

> A failing remote api that you are calling and a failing filesystem you
> are using have different implications.
>
> Kind Regards,
> Brandon
>
> On Tue, Mar 4, 2025 at 7:47 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
> >
> > I don't say that using remote object storage is useless.
> >
> > I am just saying that I don't see the difference. I have not measured
> that but I can imagine that s3 mounted would use, under the hood, the same
> calls to s3 api. How else would it be done? You need to talk to remote s3
> storage eventually anyway. So why does it matter if we call s3 api from
> Java or by other means from some "s3 driver"?  It is eventually using same
> thing, no?
> >
> > On Tue, Mar 4, 2025 at 12:47 PM Jeff Jirsa <jji...@gmail.com> wrote:
> >>
> >> Mounting an s3 bucket as a directory is an easy but poor implementation
> of object backed storage for databases
> >>
> >> Object storage is durable (most data loss is due to bugs not concurrent
> hardware failures), cheap (can 5-10x cheaper) and ubiquitous. A  huge
> number of modern systems are object-storage-only because the approximately
> infinite scale / cost / throughput tradeoffs often make up for the latency.
> >>
> >> Outright dismissing object storage for Cassandra is short sighted - it
> needs to be done in a way that makes sense, not just blindly copying over
> the block access patterns to object.
> >>
> >>
> >> On Mar 4, 2025, at 11:19 AM, Štefan Miklošovič <smikloso...@apache.org>
> wrote:
> >>
> >> 
> >> I do not think we need this CEP, honestly. I don't want to diss this
> unnecessarily but if you mount a remote storage locally (e.g. mounting s3
> bucket as if it was any other directory on node's machine), then what is
> this CEP good for?
> >>
> >> Not talking about the necessity to put all dependencies to be able to
> talk to respective remote storage to Cassandra's class path, introducing
> potential problems with dependencies and their possible incompatibilities /
> different versions etc ...
> >>
> >> On Thu, Feb 27, 2025 at 6:21 AM C. Scott Andreas <sc...@paradoxica.net>
> wrote:
> >>>
> >>> I’d love to see this implemented — where “this” is a proxy for some
> notion of support for remote object storage, perhaps usable by compaction
> strategies like TWCS to migrate data older than a threshold from a local
> filesystem to remote object.
> >>>
> >>> It’s not an area where I can currently dedicate engineering effort.
> But if others are interested in contributing a feature like this, I’d see
> it as valuable for the project and would be happy to collaborate on
> design/architecture/goals.
> >>>
> >>> – Scott
> >>>
> >>> On Feb 26, 2025, at 6:56 AM, guo Maxwell <cclive1...@gmail.com> wrote:
> >>>
> >>> 
> >>> Is anyone else interested in continuing to discuss this topic?
> >>>
> >>> guo Maxwell <cclive1...@gmail.com> 于2024年9月20日周五 09:44写道:
> >>>>
> >>>> I discussed this offline with Claude, he is no longer working on this.
> >>>>
> >>>> It's a pity. I think this is a very valuable thing. Commitlog's
> archiving and restore may be able to use the relevant code if it is
> completed.
> >>>>
> >>>> Patrick McFadin <pmcfa...@gmail.com>于2024年9月20日 周五上午2:01写道:
> >>>>>
> >>>>> Thanks for reviving this one!
> >>>>>
> >>>>> On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell <cclive1...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Is there any update on this topic?  It seems that things can make a
> big progress if  Jake Luciani  can find someone who can make the
> FileSystemProvider code accessible.
> >>>>>>
> >>>>>> Jon Haddad <j...@jonhaddad.com> 于2023年12月16日周六 05:29写道:
> >>>>>>>
> >>>>>>> At a high level I really like the idea of being able to better
> leverage cheaper storage especially object stores like S3.
> >>>>>>>
> >>>>>>> One important thing though - I feel pretty strongly that there's a
> big, deal breaking downside.   Backups, disk failure policies, snapshots
> and possibly repairs would get more complicated which haven't been
> particularly great in the past, and of course there's the issue of failure
> recovery being only partially possible if you're looking at a durable block
> store paired with an ephemeral one with some of your data not replicated to
> the cold side.  That introduces a failure case that's unacceptable for most
> teams, which results in needing to implement potentially 2 different backup
> solutions.  This is operationally complex with a lot of surface area for
> headaches.  I think a lot of teams would probably have an issue with the
> big question mark around durability and I probably would avoid it myself.
> >>>>>>>
> >>>>>>> On the other hand, I'm +1 if we approach it something slightly
> differently - where _all_ the data is located on the cold storage, with the
> local hot storage used as a cache.  This means we can use the cold
> directories for the complete dataset, simplifying backups and node
> replacements.
> >>>>>>>
> >>>>>>> For a little background, we had a ticket several years ago where I
> pointed out it was possible to do this *today* at the operating system
> level as long as you're using block devices (vs an object store) and LVM
> [1].  For example, this works well with GP3 EBS w/ low IOPS provisioning +
> local NVMe to get a nice balance of great read performance without going
> nuts on the cost for IOPS.  I also wrote about this in a little more detail
> in my blog [2].  There's also the new mount point tech in AWS which pretty
> much does exactly what I've suggested above [3] that's probably worth
> evaluating just to get a feel for it.
> >>>>>>>
> >>>>>>> I'm not insisting we require LVM or the AWS S3 fs, since that
> would rule out other cloud providers, but I am pretty confident that the
> entire dataset should reside in the "cold" side of things for the practical
> and technical reasons I listed above.  I don't think it massively changes
> the proposal, and should simplify things for everyone.
> >>>>>>>
> >>>>>>> Jon
> >>>>>>>
> >>>>>>> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
> >>>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460
> >>>>>>> [3]
> https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Dec 14, 2023 at 1:56 AM Claude Warren <cla...@apache.org>
> wrote:
> >>>>>>>>
> >>>>>>>> Is there still interest in this?  Can we get some points down on
> electrons so that we all understand the issues?
> >>>>>>>>
> >>>>>>>> While it is fairly simple to redirect the read/write to something
> other  than the local system for a single node this will not solve the
> problem for tiered storage.
> >>>>>>>>
> >>>>>>>> Tiered storage will require that on read/write the primary key be
> assessed and determine if the read/write should be redirected.  My
> reasoning for this statement is that in a cluster with a replication factor
> greater than 1 the node will store data for the keys that would be
> allocated to it in a cluster with a replication factor = 1, as well as some
> keys from nodes earlier in the ring.
> >>>>>>>>
> >>>>>>>> Even if we can get the primary keys for all the data we want to
> write to "cold storage" to map to a single node a replication factor > 1
> means that data will also be placed in "normal storage" on subsequent nodes.
> >>>>>>>>
> >>>>>>>> To overcome this, we have to explore ways to route data to
> different storage based on the keys and that different storage may have to
> be available on _all_  the nodes.
> >>>>>>>>
> >>>>>>>> Have any of the partial solutions mentioned in this email chain
> (or others) solved this problem?
> >>>>>>>>
> >>>>>>>> Claude
>

Reply via email to