If it is ok for Henrik to rebase the Astra implementation of this functionality (FileSystemProvider) onto Cassandra trunk.
Then we can create a jira to move this forward for a small step. Claude Warren, Jr <claude.war...@aiven.io> 于2023年10月18日周三 15:05写道: > Henrik and Guo, > > Have you moved forward on this topic? I have not seen anything recently. > I have posted a solution that intercepts calls for directories and injects > directories from different FileSystems. This means that a node can have > keyspaces both on the local file system and one or more other FileSystem > implementations. > > I look forward to hearing from you, > Claude > > > On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr <claude.war...@aiven.io> > wrote: > >> After a bit more analysis and some testing I have a new branch that I >> think solves the problem. [1] I have also created a pull request internal >> to my clone so that it is easy to see the changes. [2] >> >> The strategy change is to move the insertion of the proxy from the >> Cassandra File class to the Directories class. This means that all action >> with the table is captured (this solves a problem encountered in the >> earlier strategy). >> The strategy is to create a path on a different FileSystem and return >> that. The example code only moves the data for the table to another >> directory on the same FileSystem but using a different FileSystem >> implementation should be a trivial change. >> >> The current code works on an entire keyspace. I, while code exists to >> limit the redirect to a table I have not tested that branch yet and am not >> certain that it will work. There is also some code (i.e. the PathParser) >> that may no longer be needed but has not been removed yet. >> >> Please take a look and let me know if you see any issues with this >> solution. >> >> Claude >> >> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy >> [2] https://github.com/Claudenw/cassandra/pull/5/files >> >> >> >> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr < >> claude.war...@aiven.io> wrote: >> >>> I have been exploring adding a second Path to the Cassandra File >>> object. The original path being the path within the standard Cassandra >>> directory tree and the second being a translated path when there is what >>> was called a ChannelProxy in place. >>> >>> A problem arises when the Directories.getLocationForDisk() is called. >>> It seems to be looking for locations that start with the data directory >>> absolute path. I can change it to make it look for the original path not >>> the translated path. But in other cases the translated path is the one >>> that is needed. >>> >>> I notice that there is a concept of multiple file locations in the code >>> base, particularly in the Directories.DataDirectories class where there are >>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the >>> constructor, and in the >>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method >>> which returns an array of String and is populated from the cassandra.yaml >>> file. >>> >>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() >>> only ever seems to return an array of one item. >>> >>> Why does >>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() return an >>> array? >>> >>> Should the system set the path to the root of the ColumnFamilyStore in >>> the ColumnFamilyStore directories instance? >>> Should the Directories.getLocationForDisk() do the proxy to the other >>> file system? >>> >>> Where is the proper location to change from the standard internal >>> representation to the remote location? >>> >>> >>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr < >>> claude.war...@aiven.io> wrote: >>> >>>> Sorry I was out sick and did not respond yesterday. >>>> >>>> Henrik, How does your system work? What is the design strategy? Also >>>> is your code available somewhere? >>>> >>>> After looking at the code some more I think that the best solution is >>>> not a FileChannelProxy but to modify the Cassandra File class to get a >>>> FileSystem object for a Factory to build the Path that is used within that >>>> object. I think that this makes if very small change that will pick up >>>> 90+% of the cases. We then just need to find the edge cases. >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev < >>>> dev@cassandra.apache.org> wrote: >>>> >>>>> Super excited about this as well. Happy to help test with Azure and >>>>> any other way needed. >>>>> >>>>> Thanks, >>>>> German >>>>> ------------------------------ >>>>> *From:* guo Maxwell <cclive1...@gmail.com> >>>>> *Sent:* Wednesday, September 27, 2023 7:38 PM >>>>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable >>>>> ChannelProxy to alias external storage locations >>>>> >>>>> Thanks , So I think a jira can be created now. And I'd be happy to >>>>> provide some help with this as well if needed. >>>>> >>>>> Henrik Ingo <henrik.i...@datastax.com> 于2023年9月28日周四 00:21写道: >>>>> >>>>> It seems I was volunteered to rebase the Astra implementation of this >>>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it, >>>>> of course) I'll try to get going today or tomorrow, so that this >>>>> discussion can then benefit from having that code available for >>>>> inspection. >>>>> And potentially using it as a soluttion to this use case. >>>>> >>>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani <jak...@gmail.com> wrote: >>>>> >>>>> We (DataStax) have a FileSystemProvider for Astra we can provide. >>>>> Works with S3/GCS/Azure. >>>>> >>>>> I'll ask someone on our end to make it accessible. >>>>> >>>>> This would work by having a bucket prefix per node. But there are lots >>>>> of details needed to support things like out of bound compaction >>>>> (mentioned in CEP). >>>>> >>>>> Jake >>>>> >>>>> On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote: >>>>> > >>>>> > I agree with Ariel, the more suitable insertion point is probably >>>>> the JDK level FileSystemProvider and FileSystem abstraction. >>>>> > >>>>> > It might also be that we can reuse existing work here in some cases? >>>>> > >>>>> > On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote: >>>>> > >>>>> > >>>>> > Hi, >>>>> > >>>>> > Support for multiple storage backends including remote storage >>>>> backends is a pretty high value piece of functionality. I am happy to see >>>>> there is interest in that. >>>>> > >>>>> > I think that `ChannelProxyFactory` as an integration point is going >>>>> to quickly turn into a dead end as we get into really using multiple >>>>> storage backends. We need to be able to list files and really the full >>>>> range of filesystem interactions that Java supports should work with any >>>>> backend to make development, testing, and using existing code >>>>> straightforward. >>>>> > >>>>> > It's a little more work to get C* to creates paths for alternate >>>>> backends where appropriate, but that works is probably necessary even with >>>>> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple >>>>> Fileystems). There will probably also be backend specific behaviors that >>>>> show up above the `ChannelProxy` layer that will depend on the backend. >>>>> > >>>>> > Ideally there would be some config to specify several backend >>>>> filesystems and their individual configuration that can be used, as well >>>>> as >>>>> configuration and support for a "backend file router" for file creation >>>>> (and opening) that can be used to route files to the backend most >>>>> appropriate. >>>>> > >>>>> > Regards, >>>>> > Ariel >>>>> > >>>>> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote: >>>>> > >>>>> > I have just filed CEP-36 [1] to allow for keyspace/table storage >>>>> outside of the standard storage space. >>>>> > >>>>> > There are two desires driving this change: >>>>> > >>>>> > The ability to temporarily move some keyspaces/tables to storage >>>>> outside the normal directory tree to other disk so that compaction can >>>>> occur in situations where there is not enough disk space for compaction >>>>> and >>>>> the processing to the moved data can not be suspended. >>>>> > The ability to store infrequently used data on slower cheaper >>>>> storage layers. >>>>> > >>>>> > I have a working POC implementation [2] though there are some issues >>>>> still to be solved and much logging to be reduced. >>>>> > >>>>> > I look forward to productive discussions, >>>>> > Claude >>>>> > >>>>> > [1] >>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations >>>>> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>>> -- >>>>> http://twitter.com/tjake >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Henrik Ingo >>>>> >>>>> c. +358 40 569 7354 >>>>> >>>>> w. www.datastax.com >>>>> >>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax> >>>>> <https://www.linkedin.com/company/datastax/> >>>>> <https://github.com/datastax/> >>>>> >>>>> >>>>> >>>>> -- >>>>> you are the apple of my eye ! >>>>> >>>>