Is anyone else interested in continuing to discuss this topic? guo Maxwell <cclive1...@gmail.com> 于2024年9月20日周五 09:44写道:
> I discussed this offline with Claude, he is no longer working on this. > > It's a pity. I think this is a very valuable thing. Commitlog's archiving > and restore may be able to use the relevant code if it is completed. > > Patrick McFadin <pmcfa...@gmail.com>于2024年9月20日 周五上午2:01写道: > >> Thanks for reviving this one! >> >> On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell <cclive1...@gmail.com> >> wrote: >> >>> Is there any update on this topic? It seems that things can make a big >>> progress if Jake Luciani can find someone who can make the >>> FileSystemProvider code accessible. >>> >>> Jon Haddad <j...@jonhaddad.com> 于2023年12月16日周六 05:29写道: >>> >>>> At a high level I really like the idea of being able to better leverage >>>> cheaper storage especially object stores like S3. >>>> >>>> One important thing though - I feel pretty strongly that there's a big, >>>> deal breaking downside. Backups, disk failure policies, snapshots and >>>> possibly repairs would get more complicated which haven't been particularly >>>> great in the past, and of course there's the issue of failure recovery >>>> being only partially possible if you're looking at a durable block store >>>> paired with an ephemeral one with some of your data not replicated to the >>>> cold side. That introduces a failure case that's unacceptable for most >>>> teams, which results in needing to implement potentially 2 different backup >>>> solutions. This is operationally complex with a lot of surface area for >>>> headaches. I think a lot of teams would probably have an issue with the >>>> big question mark around durability and I probably would avoid it myself. >>>> >>>> On the other hand, I'm +1 if we approach it something slightly >>>> differently - where _all_ the data is located on the cold storage, with the >>>> local hot storage used as a cache. This means we can use the cold >>>> directories for the complete dataset, simplifying backups and node >>>> replacements. >>>> >>>> For a little background, we had a ticket several years ago where I >>>> pointed out it was possible to do this *today* at the operating system >>>> level as long as you're using block devices (vs an object store) and LVM >>>> [1]. For example, this works well with GP3 EBS w/ low IOPS provisioning + >>>> local NVMe to get a nice balance of great read performance without going >>>> nuts on the cost for IOPS. I also wrote about this in a little more detail >>>> in my blog [2]. There's also the new mount point tech in AWS which pretty >>>> much does exactly what I've suggested above [3] that's probably worth >>>> evaluating just to get a feel for it. >>>> >>>> I'm not insisting we require LVM or the AWS S3 fs, since that would >>>> rule out other cloud providers, but I am pretty confident that the entire >>>> dataset should reside in the "cold" side of things for the practical and >>>> technical reasons I listed above. I don't think it massively changes the >>>> proposal, and should simplify things for everyone. >>>> >>>> Jon >>>> >>>> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/ >>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460 >>>> [3] >>>> https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/ >>>> >>>> >>>> On Thu, Dec 14, 2023 at 1:56 AM Claude Warren <cla...@apache.org> >>>> wrote: >>>> >>>>> Is there still interest in this? Can we get some points down on >>>>> electrons so that we all understand the issues? >>>>> >>>>> While it is fairly simple to redirect the read/write to something >>>>> other than the local system for a single node this will not solve the >>>>> problem for tiered storage. >>>>> >>>>> Tiered storage will require that on read/write the primary key be >>>>> assessed and determine if the read/write should be redirected. My >>>>> reasoning for this statement is that in a cluster with a replication >>>>> factor >>>>> greater than 1 the node will store data for the keys that would be >>>>> allocated to it in a cluster with a replication factor = 1, as well as >>>>> some >>>>> keys from nodes earlier in the ring. >>>>> >>>>> Even if we can get the primary keys for all the data we want to write >>>>> to "cold storage" to map to a single node a replication factor > 1 means >>>>> that data will also be placed in "normal storage" on subsequent nodes. >>>>> >>>>> To overcome this, we have to explore ways to route data to different >>>>> storage based on the keys and that different storage may have to be >>>>> available on _all_ the nodes. >>>>> >>>>> Have any of the partial solutions mentioned in this email chain (or >>>>> others) solved this problem? >>>>> >>>>> Claude >>>>> >>>>