Henrik and Guo, Have you moved forward on this topic? I have not seen anything recently. I have posted a solution that intercepts calls for directories and injects directories from different FileSystems. This means that a node can have keyspaces both on the local file system and one or more other FileSystem implementations.
I look forward to hearing from you, Claude On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr <claude.war...@aiven.io> wrote: > After a bit more analysis and some testing I have a new branch that I > think solves the problem. [1] I have also created a pull request internal > to my clone so that it is easy to see the changes. [2] > > The strategy change is to move the insertion of the proxy from the > Cassandra File class to the Directories class. This means that all action > with the table is captured (this solves a problem encountered in the > earlier strategy). > The strategy is to create a path on a different FileSystem and return > that. The example code only moves the data for the table to another > directory on the same FileSystem but using a different FileSystem > implementation should be a trivial change. > > The current code works on an entire keyspace. I, while code exists to > limit the redirect to a table I have not tested that branch yet and am not > certain that it will work. There is also some code (i.e. the PathParser) > that may no longer be needed but has not been removed yet. > > Please take a look and let me know if you see any issues with this > solution. > > Claude > > [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy > [2] https://github.com/Claudenw/cassandra/pull/5/files > > > > On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr <claude.war...@aiven.io> > wrote: > >> I have been exploring adding a second Path to the Cassandra File object. >> The original path being the path within the standard Cassandra directory >> tree and the second being a translated path when there is what was called a >> ChannelProxy in place. >> >> A problem arises when the Directories.getLocationForDisk() is called. It >> seems to be looking for locations that start with the data directory >> absolute path. I can change it to make it look for the original path not >> the translated path. But in other cases the translated path is the one >> that is needed. >> >> I notice that there is a concept of multiple file locations in the code >> base, particularly in the Directories.DataDirectories class where there are >> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the >> constructor, and in the >> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method >> which returns an array of String and is populated from the cassandra.yaml >> file. >> >> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() >> only ever seems to return an array of one item. >> >> Why does >> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() return an >> array? >> >> Should the system set the path to the root of the ColumnFamilyStore in >> the ColumnFamilyStore directories instance? >> Should the Directories.getLocationForDisk() do the proxy to the other >> file system? >> >> Where is the proper location to change from the standard internal >> representation to the remote location? >> >> >> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr <claude.war...@aiven.io> >> wrote: >> >>> Sorry I was out sick and did not respond yesterday. >>> >>> Henrik, How does your system work? What is the design strategy? Also >>> is your code available somewhere? >>> >>> After looking at the code some more I think that the best solution is >>> not a FileChannelProxy but to modify the Cassandra File class to get a >>> FileSystem object for a Factory to build the Path that is used within that >>> object. I think that this makes if very small change that will pick up >>> 90+% of the cases. We then just need to find the edge cases. >>> >>> >>> >>> >>> >>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev < >>> dev@cassandra.apache.org> wrote: >>> >>>> Super excited about this as well. Happy to help test with Azure and any >>>> other way needed. >>>> >>>> Thanks, >>>> German >>>> ------------------------------ >>>> *From:* guo Maxwell <cclive1...@gmail.com> >>>> *Sent:* Wednesday, September 27, 2023 7:38 PM >>>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable >>>> ChannelProxy to alias external storage locations >>>> >>>> Thanks , So I think a jira can be created now. And I'd be happy to >>>> provide some help with this as well if needed. >>>> >>>> Henrik Ingo <henrik.i...@datastax.com> 于2023年9月28日周四 00:21写道: >>>> >>>> It seems I was volunteered to rebase the Astra implementation of this >>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it, >>>> of course) I'll try to get going today or tomorrow, so that this >>>> discussion can then benefit from having that code available for inspection. >>>> And potentially using it as a soluttion to this use case. >>>> >>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani <jak...@gmail.com> wrote: >>>> >>>> We (DataStax) have a FileSystemProvider for Astra we can provide. >>>> Works with S3/GCS/Azure. >>>> >>>> I'll ask someone on our end to make it accessible. >>>> >>>> This would work by having a bucket prefix per node. But there are lots >>>> of details needed to support things like out of bound compaction >>>> (mentioned in CEP). >>>> >>>> Jake >>>> >>>> On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote: >>>> > >>>> > I agree with Ariel, the more suitable insertion point is probably the >>>> JDK level FileSystemProvider and FileSystem abstraction. >>>> > >>>> > It might also be that we can reuse existing work here in some cases? >>>> > >>>> > On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote: >>>> > >>>> > >>>> > Hi, >>>> > >>>> > Support for multiple storage backends including remote storage >>>> backends is a pretty high value piece of functionality. I am happy to see >>>> there is interest in that. >>>> > >>>> > I think that `ChannelProxyFactory` as an integration point is going >>>> to quickly turn into a dead end as we get into really using multiple >>>> storage backends. We need to be able to list files and really the full >>>> range of filesystem interactions that Java supports should work with any >>>> backend to make development, testing, and using existing code >>>> straightforward. >>>> > >>>> > It's a little more work to get C* to creates paths for alternate >>>> backends where appropriate, but that works is probably necessary even with >>>> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple >>>> Fileystems). There will probably also be backend specific behaviors that >>>> show up above the `ChannelProxy` layer that will depend on the backend. >>>> > >>>> > Ideally there would be some config to specify several backend >>>> filesystems and their individual configuration that can be used, as well as >>>> configuration and support for a "backend file router" for file creation >>>> (and opening) that can be used to route files to the backend most >>>> appropriate. >>>> > >>>> > Regards, >>>> > Ariel >>>> > >>>> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote: >>>> > >>>> > I have just filed CEP-36 [1] to allow for keyspace/table storage >>>> outside of the standard storage space. >>>> > >>>> > There are two desires driving this change: >>>> > >>>> > The ability to temporarily move some keyspaces/tables to storage >>>> outside the normal directory tree to other disk so that compaction can >>>> occur in situations where there is not enough disk space for compaction and >>>> the processing to the moved data can not be suspended. >>>> > The ability to store infrequently used data on slower cheaper storage >>>> layers. >>>> > >>>> > I have a working POC implementation [2] though there are some issues >>>> still to be solved and much logging to be reduced. >>>> > >>>> > I look forward to productive discussions, >>>> > Claude >>>> > >>>> > [1] >>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations >>>> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory >>>> > >>>> > >>>> > >>>> >>>> >>>> -- >>>> http://twitter.com/tjake >>>> >>>> >>>> >>>> -- >>>> >>>> Henrik Ingo >>>> >>>> c. +358 40 569 7354 >>>> >>>> w. www.datastax.com >>>> >>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax> >>>> <https://www.linkedin.com/company/datastax/> >>>> <https://github.com/datastax/> >>>> >>>> >>>> >>>> -- >>>> you are the apple of my eye ! >>>> >>>