Claude if you’re still at POC phase does it make sense for you to perhaps help validate / qualify the work that Henrik seems willing to rebase rather than reinventing the wheel? 

On Sep 26, 2023, at 11:23 PM, Claude Warren, Jr via dev <dev@cassandra.apache.org> wrote:


I spent a little (very little) time building an S3 implementation using an Apache licensed S3 filesystem package.  I have not yet tested it but if anyone is interested it is at https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxy

In looking at some of the code I think the Cassandra File class needs to be modified to ask the ChannelProxy for the default file system for the file in question.  This should resolve some of the issues my original demo has with some files being created in the data tree.  It may also handle many of the cases for offline tools as well.


On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan <stefan.mikloso...@netapp.com> wrote:
Would it be possible to make Jimfs integration production-ready then? I see we are using it in the tests already.

It might be one of the reference implementations of this CEP. If there is a type of workload / type of nodes with plenty of RAM but no disk, some kind of compute nodes, it would just hold it all in memory and we might "flush" it to a cloud-based storage if rendered to be not necessary anymore (whatever that means).

We could then completely bypass the memtables as fetching data from an SSTable from memory would be basically roughly same?

On the other hand, that might be achieved by creating a ramdisk so I am not sure what exactly we would gain here. However, if it was eventually storing these SSTables in a cloud storage, we might "compact" "TWCS tables" automatically after so-and-so period by moving them there.

________________________________________
From: Jake Luciani <jak...@gmail.com>
Sent: Tuesday, September 26, 2023 19:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a pretty high value piece of functionality. I am happy to see there is interest in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to quickly turn into a dead end as we get into really using multiple storage backends. We need to be able to list files and really the full range of filesystem interactions that Java supports should work with any backend to make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends where appropriate, but that works is probably necessary even with `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple Fileystems). There will probably also be backend specific behaviors that show up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and their individual configuration that can be used, as well as configuration and support for a "backend file router" for file creation (and opening) that can be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the normal directory tree to other disk so that compaction can occur in situations where there is not enough disk space for compaction and the processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


--
http://twitter.com/tjake

Reply via email to