If it is ok for Henrik to rebase the Astra implementation of this
functionality (FileSystemProvider) onto Cassandra trunk.

Then we can create a jira to move this forward for a small step.

Claude Warren, Jr <claude.war...@aiven.io> 于2023年10月18日周三 15:05写道:

> Henrik and Guo,
>
> Have you moved forward on this topic?  I have not seen anything recently.
> I have posted a solution that intercepts calls for directories and injects
> directories from different FileSystems.  This means that a node can have
> keyspaces both on the local file system and one or more other FileSystem
> implementations.
>
> I look forward to hearing from you,
> Claude
>
>
> On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr <claude.war...@aiven.io>
> wrote:
>
>> After a bit more analysis and some testing I have a new branch that I
>> think solves the problem. [1]  I have also created a pull request internal
>> to my clone so that it is easy to see the changes. [2]
>>
>> The strategy change is to move the insertion of the proxy from the
>> Cassandra File class to the Directories class.  This means that all action
>> with the table is captured (this solves a problem encountered in the
>> earlier strategy).
>> The strategy is to create a path on a different FileSystem and return
>> that.  The example code only moves the data for the table to another
>> directory on the same FileSystem but using a different FileSystem
>> implementation should be a trivial change.
>>
>> The current code works on an entire keyspace.  I, while code exists to
>> limit the redirect to a table I have not tested that branch yet and am not
>> certain that it will work.  There is also some code (i.e. the PathParser)
>> that may no longer be needed but has not been removed yet.
>>
>> Please take a look and let me know if you see any issues with this
>> solution.
>>
>> Claude
>>
>> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
>> [2] https://github.com/Claudenw/cassandra/pull/5/files
>>
>>
>>
>> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr <
>> claude.war...@aiven.io> wrote:
>>
>>> I have been exploring adding a second Path to the Cassandra File
>>> object.  The original path being the path within the standard Cassandra
>>> directory tree and the second being a translated path when there is what
>>> was called a ChannelProxy in place.
>>>
>>> A problem arises when the Directories.getLocationForDisk() is called.
>>> It seems to be looking for locations that start with the data directory
>>> absolute path.   I can change it to make it look for the original path not
>>> the translated path.  But in other cases the translated path is the one
>>> that is needed.
>>>
>>> I notice that there is a concept of multiple file locations in the code
>>> base, particularly in the Directories.DataDirectories class where there are
>>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
>>> constructor, and in the
>>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
>>> which returns an array of String and is populated from the cassandra.yaml
>>> file.
>>>
>>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
>>> only ever seems to return an array of one item.
>>>
>>> Why does
>>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  return an
>>> array?
>>>
>>> Should the system set the path to the root of the ColumnFamilyStore in
>>> the ColumnFamilyStore directories instance?
>>> Should the Directories.getLocationForDisk() do the proxy to the other
>>> file system?
>>>
>>> Where is the proper location to change from the standard internal
>>> representation to the remote location?
>>>
>>>
>>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr <
>>> claude.war...@aiven.io> wrote:
>>>
>>>> Sorry I was out sick and did not respond yesterday.
>>>>
>>>> Henrik,  How does your system work?  What is the design strategy?  Also
>>>> is your code available somewhere?
>>>>
>>>> After looking at the code some more I think that the best solution is
>>>> not a FileChannelProxy but to modify the Cassandra File class to get a
>>>> FileSystem object for a Factory to build the Path that is used within that
>>>> object.  I think that this makes if very small change that will pick up
>>>> 90+% of the cases.  We then just need to find the edge cases.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> Super excited about this as well. Happy to help test with Azure and
>>>>> any other way needed.
>>>>>
>>>>> Thanks,
>>>>> German
>>>>> ------------------------------
>>>>> *From:* guo Maxwell <cclive1...@gmail.com>
>>>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>>>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable
>>>>> ChannelProxy to alias external storage locations
>>>>>
>>>>> Thanks , So I think a jira can be created now. And I'd be happy to
>>>>> provide some help with this as well if needed.
>>>>>
>>>>> Henrik Ingo <henrik.i...@datastax.com> 于2023年9月28日周四 00:21写道:
>>>>>
>>>>> It seems I was volunteered to rebase the Astra implementation of this
>>>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>>>>> of course) I'll try to get going today or tomorrow, so that this
>>>>> discussion can then benefit from having that code available for 
>>>>> inspection.
>>>>> And potentially using it as a soluttion to this use case.
>>>>>
>>>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani <jak...@gmail.com> wrote:
>>>>>
>>>>> We (DataStax) have a FileSystemProvider for Astra we can provide.
>>>>> Works with S3/GCS/Azure.
>>>>>
>>>>> I'll ask someone on our end to make it accessible.
>>>>>
>>>>> This would work by having a bucket prefix per node. But there are lots
>>>>> of details needed to support things like out of bound compaction
>>>>> (mentioned in CEP).
>>>>>
>>>>> Jake
>>>>>
>>>>> On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote:
>>>>> >
>>>>> > I agree with Ariel, the more suitable insertion point is probably
>>>>> the JDK level FileSystemProvider and FileSystem abstraction.
>>>>> >
>>>>> > It might also be that we can reuse existing work here in some cases?
>>>>> >
>>>>> > On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>> >
>>>>> > 
>>>>> > Hi,
>>>>> >
>>>>> > Support for multiple storage backends including remote storage
>>>>> backends is a pretty high value piece of functionality. I am happy to see
>>>>> there is interest in that.
>>>>> >
>>>>> > I think that `ChannelProxyFactory` as an integration point is going
>>>>> to quickly turn into a dead end as we get into really using multiple
>>>>> storage backends. We need to be able to list files and really the full
>>>>> range of filesystem interactions that Java supports should work with any
>>>>> backend to make development, testing, and using existing code
>>>>> straightforward.
>>>>> >
>>>>> > It's a little more work to get C* to creates paths for alternate
>>>>> backends where appropriate, but that works is probably necessary even with
>>>>> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
>>>>> Fileystems). There will probably also be backend specific behaviors that
>>>>> show up above the `ChannelProxy` layer that will depend on the backend.
>>>>> >
>>>>> > Ideally there would be some config to specify several backend
>>>>> filesystems and their individual configuration that can be used, as well 
>>>>> as
>>>>> configuration and support for a "backend file router" for file creation
>>>>> (and opening) that can be used to route files to the backend most
>>>>> appropriate.
>>>>> >
>>>>> > Regards,
>>>>> > Ariel
>>>>> >
>>>>> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>>>>> >
>>>>> > I have just filed CEP-36 [1] to allow for keyspace/table storage
>>>>> outside of the standard storage space.
>>>>> >
>>>>> > There are two desires  driving this change:
>>>>> >
>>>>> > The ability to temporarily move some keyspaces/tables to storage
>>>>> outside the normal directory tree to other disk so that compaction can
>>>>> occur in situations where there is not enough disk space for compaction 
>>>>> and
>>>>> the processing to the moved data can not be suspended.
>>>>> > The ability to store infrequently used data on slower cheaper
>>>>> storage layers.
>>>>> >
>>>>> > I have a working POC implementation [2] though there are some issues
>>>>> still to be solved and much logging to be reduced.
>>>>> >
>>>>> > I look forward to productive discussions,
>>>>> > Claude
>>>>> >
>>>>> > [1]
>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>>>>> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> http://twitter.com/tjake
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Henrik Ingo
>>>>>
>>>>> c. +358 40 569 7354
>>>>>
>>>>> w. www.datastax.com
>>>>>
>>>>> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
>>>>> <https://www.linkedin.com/company/datastax/>
>>>>> <https://github.com/datastax/>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> you are the apple of my eye !
>>>>>
>>>>

Reply via email to