Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

Jon Haddad Fri, 07 Mar 2025 08:45:58 -0800

Nobody is saying you can't work with a mount, and this isn't a conversation
about snapshots.


Nobody is forcing users to use object storage either.

You're making a ton of negative assumptions here about both the discussion,
and the people you're having it with.  Try to be more open minded.


On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> The only way I see that working is that, if everything was in a bucket, if
> you take a snapshot, these SSTables would be "copied" from live data dir
> (living in a bucket) to snapshots dir (living in a bucket). Basically, we
> would need to say "and if you go to take a snapshot on this table, instead
> of hardlinking these SSTables, do a copy". But this "copying" would be
> internal to a bucket itself. We would not need to "upload" from node's
> machine to s3.
>
> While this might work, what I find tricky is that we are forcing this to
> users. Not everybody is interested in putting everything to a bucket and
> server traffic from that. They just don't want to do that. Because reasons.
> They are just happy with what they have etc, it works fine for years and so
> on. They just want to upload SSTables upon snapshotting and call it a day.
>
> I don't think we should force our worldview on them if they are not
> interested in it.
>
> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
>> BTW, snapshots are quite special because these are not "files", they are
>> just hard links. They "materialize" as regular files once underlying
>> SSTables are compacted away. How are you going to hardlink from local
>> storage to an object storage anyway? We will always need to "upload".
>>
>> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>>> Jon,
>>>
>>> all "big three" support mounting a bucket locally. That being said, I do
>>> not think that completely ditching this possibility for Cassandra working
>>> with a mount, e.g. for just uploading snapshots there etc, is reasonable.
>>>
>>> GCP
>>>
>>>
>>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket
>>>
>>> Azure (this one is quite sophisticated), lot of options ...
>>>
>>>
>>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL
>>>
>>> S3, lot of options how to mount that
>>>
>>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system
>>>
>>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com>
>>> wrote:
>>>
>>>> Assuming everything else is identical, might not matter for S3.
>>>> However, not every object store has a filesystem mount.
>>>>
>>>> Regarding sprawling dependencies, we can always make the provider
>>>> specific libraries available as a separate download and put them on their
>>>> own thread with a separate class path. I think in JVM dtest does this
>>>> already.  Someone just started asking about IAM for login, it sounds like a
>>>> similar problem.
>>>>
>>>>
>>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote:
>>>>
>>>>> I think another way of saying what Stefan may be getting at is what
>>>>> does a library give us that an appropriately configured mount dir doesn’t?
>>>>>
>>>>> We don’t want to treat S3 the same as local disk, but this can be
>>>>> achieved easily with config. Is there some other benefit of direct
>>>>> integration? Well defined exceptions if we need to distinguish cases is 
>>>>> one
>>>>> that maybe springs to mind but perhaps there are others?
>>>>>
>>>>>
>>>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org>
>>>>> wrote:
>>>>>
>>>>> 
>>>>>
>>>>> That is cool but this still does not show / explain how it would look
>>>>> like when it comes to dependencies needed for actually talking to storages
>>>>> like s3.
>>>>>
>>>>> Maybe I am missing something here and please explain when I am
>>>>> mistaken but If I understand that correctly, for talking to s3 we would
>>>>> need to use a library like this, right? (1). So that would be added among
>>>>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? 
>>>>> Why
>>>>> s3? Every time somebody comes up with a new remote storage support, that
>>>>> would be added to classpath as well? How are these dependencies going to
>>>>> play with each other and with Cassandra in general? Will all these storage
>>>>> provider libraries for arbitrary clouds be even compatible with Cassandra
>>>>> licence-wise?
>>>>>
>>>>> I am sorry I keep repeating these questions but this part of that I
>>>>> just don't get at all.
>>>>>
>>>>> We can indeed add an API for this, sure sure, why not. But for people
>>>>> who do not want to deal with this at all and just be OK with a FS mounted,
>>>>> why would we block them doing that?
>>>>>
>>>>> (1)
>>>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>>>>>
>>>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> wrote:
>>>>>
>>>>>>    .
>>>>>>
>>>>>>
>>>>>> It’s not an area where I can currently dedicate engineering effort.
>>>>>>> But if others are interested in contributing a feature like this, I’d 
>>>>>>> see
>>>>>>> it as valuable for the project and would be happy to collaborate on
>>>>>>> design/architecture/goals.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Jake mentioned 17 months ago a custom FileSystemProvider we could
>>>>>> offer.
>>>>>>
>>>>>> None of us at DataStax has gotten around to providing that, but to
>>>>>> quickly throw something over the wall this is it:
>>>>>>
>>>>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>>>>>>
>>>>>>   (with a few friend classes under o.a.c.io.util)
>>>>>>
>>>>>> We then have a RemoteStorageProvider, private in another repo, that
>>>>>> implements that and also provides the RemoteFileSystemProvider that Jake
>>>>>> refers to.
>>>>>>
>>>>>> Hopefully that's a start to get people thinking about CEP level
>>>>>> details, while we get a cleaned abstract of RemoteStorageProvider and
>>>>>> friends to offer.
>>>>>>
>>>>>>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

Reply via email to