I spent a little (very little) time building an S3 implementation using an
Apache licensed S3 filesystem package.  I have not yet tested it but if
anyone is interested it is at
https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxy

In looking at some of the code I think the Cassandra File class needs to be
modified to ask the ChannelProxy for the default file system for the file
in question.  This should resolve some of the issues my original demo has
with some files being created in the data tree.  It may also handle many of
the cases for offline tools as well.


On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Would it be possible to make Jimfs integration production-ready then? I
> see we are using it in the tests already.
>
> It might be one of the reference implementations of this CEP. If there is
> a type of workload / type of nodes with plenty of RAM but no disk, some
> kind of compute nodes, it would just hold it all in memory and we might
> "flush" it to a cloud-based storage if rendered to be not necessary anymore
> (whatever that means).
>
> We could then completely bypass the memtables as fetching data from an
> SSTable from memory would be basically roughly same?
>
> On the other hand, that might be achieved by creating a ramdisk so I am
> not sure what exactly we would gain here. However, if it was eventually
> storing these SSTables in a cloud storage, we might "compact" "TWCS tables"
> automatically after so-and-so period by moving them there.
>
> ________________________________________
> From: Jake Luciani <jak...@gmail.com>
> Sent: Tuesday, September 26, 2023 19:03
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias
> external storage locations
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>

Reply via email to