Is anyone else interested in continuing to discuss this topic?

guo Maxwell <cclive1...@gmail.com> 于2024年9月20日周五 09:44写道:

> I discussed this offline with Claude, he is no longer working on this.
>
> It's a pity. I think this is a very valuable thing. Commitlog's archiving
> and restore may be able to use the relevant code if it is completed.
>
> Patrick McFadin <pmcfa...@gmail.com>于2024年9月20日 周五上午2:01写道:
>
>> Thanks for reviving this one!
>>
>> On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell <cclive1...@gmail.com>
>> wrote:
>>
>>> Is there any update on this topic?  It seems that things can make a big
>>> progress if  Jake Luciani  can find someone who can make the
>>> FileSystemProvider code accessible.
>>>
>>> Jon Haddad <j...@jonhaddad.com> 于2023年12月16日周六 05:29写道:
>>>
>>>> At a high level I really like the idea of being able to better leverage
>>>> cheaper storage especially object stores like S3.
>>>>
>>>> One important thing though - I feel pretty strongly that there's a big,
>>>> deal breaking downside.   Backups, disk failure policies, snapshots and
>>>> possibly repairs would get more complicated which haven't been particularly
>>>> great in the past, and of course there's the issue of failure recovery
>>>> being only partially possible if you're looking at a durable block store
>>>> paired with an ephemeral one with some of your data not replicated to the
>>>> cold side.  That introduces a failure case that's unacceptable for most
>>>> teams, which results in needing to implement potentially 2 different backup
>>>> solutions.  This is operationally complex with a lot of surface area for
>>>> headaches.  I think a lot of teams would probably have an issue with the
>>>> big question mark around durability and I probably would avoid it myself.
>>>>
>>>> On the other hand, I'm +1 if we approach it something slightly
>>>> differently - where _all_ the data is located on the cold storage, with the
>>>> local hot storage used as a cache.  This means we can use the cold
>>>> directories for the complete dataset, simplifying backups and node
>>>> replacements.
>>>>
>>>> For a little background, we had a ticket several years ago where I
>>>> pointed out it was possible to do this *today* at the operating system
>>>> level as long as you're using block devices (vs an object store) and LVM
>>>> [1].  For example, this works well with GP3 EBS w/ low IOPS provisioning +
>>>> local NVMe to get a nice balance of great read performance without going
>>>> nuts on the cost for IOPS.  I also wrote about this in a little more detail
>>>> in my blog [2].  There's also the new mount point tech in AWS which pretty
>>>> much does exactly what I've suggested above [3] that's probably worth
>>>> evaluating just to get a feel for it.
>>>>
>>>> I'm not insisting we require LVM or the AWS S3 fs, since that would
>>>> rule out other cloud providers, but I am pretty confident that the entire
>>>> dataset should reside in the "cold" side of things for the practical and
>>>> technical reasons I listed above.  I don't think it massively changes the
>>>> proposal, and should simplify things for everyone.
>>>>
>>>> Jon
>>>>
>>>> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460
>>>> [3]
>>>> https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/
>>>>
>>>>
>>>> On Thu, Dec 14, 2023 at 1:56 AM Claude Warren <cla...@apache.org>
>>>> wrote:
>>>>
>>>>> Is there still interest in this?  Can we get some points down on
>>>>> electrons so that we all understand the issues?
>>>>>
>>>>> While it is fairly simple to redirect the read/write to something
>>>>> other  than the local system for a single node this will not solve the
>>>>> problem for tiered storage.
>>>>>
>>>>> Tiered storage will require that on read/write the primary key be
>>>>> assessed and determine if the read/write should be redirected.  My
>>>>> reasoning for this statement is that in a cluster with a replication 
>>>>> factor
>>>>> greater than 1 the node will store data for the keys that would be
>>>>> allocated to it in a cluster with a replication factor = 1, as well as 
>>>>> some
>>>>> keys from nodes earlier in the ring.
>>>>>
>>>>> Even if we can get the primary keys for all the data we want to write
>>>>> to "cold storage" to map to a single node a replication factor > 1 means
>>>>> that data will also be placed in "normal storage" on subsequent nodes.
>>>>>
>>>>> To overcome this, we have to explore ways to route data to different
>>>>> storage based on the keys and that different storage may have to be
>>>>> available on _all_  the nodes.
>>>>>
>>>>> Have any of the partial solutions mentioned in this email chain (or
>>>>> others) solved this problem?
>>>>>
>>>>> Claude
>>>>>
>>>>

Reply via email to