I was explaining multiple times (1) that I don't have anything against what
is discussed here.

Having questions about what that is going to look like does not mean I am
dismissive.

(1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg

On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad <j...@rustyrazorblade.com> wrote:

> Nobody is saying you can't work with a mount, and this isn't a
> conversation about snapshots.
>
> Nobody is forcing users to use object storage either.
>
> You're making a ton of negative assumptions here about both the
> discussion, and the people you're having it with.  Try to be more open
> minded.
>
>
> On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
>> The only way I see that working is that, if everything was in a bucket,
>> if you take a snapshot, these SSTables would be "copied" from live data dir
>> (living in a bucket) to snapshots dir (living in a bucket). Basically, we
>> would need to say "and if you go to take a snapshot on this table, instead
>> of hardlinking these SSTables, do a copy". But this "copying" would be
>> internal to a bucket itself. We would not need to "upload" from node's
>> machine to s3.
>>
>> While this might work, what I find tricky is that we are forcing this to
>> users. Not everybody is interested in putting everything to a bucket and
>> server traffic from that. They just don't want to do that. Because reasons.
>> They are just happy with what they have etc, it works fine for years and so
>> on. They just want to upload SSTables upon snapshotting and call it a day.
>>
>> I don't think we should force our worldview on them if they are not
>> interested in it.
>>
>> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>>> BTW, snapshots are quite special because these are not "files", they are
>>> just hard links. They "materialize" as regular files once underlying
>>> SSTables are compacted away. How are you going to hardlink from local
>>> storage to an object storage anyway? We will always need to "upload".
>>>
>>> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič <
>>> smikloso...@apache.org> wrote:
>>>
>>>> Jon,
>>>>
>>>> all "big three" support mounting a bucket locally. That being said, I
>>>> do not think that completely ditching this possibility for Cassandra
>>>> working with a mount, e.g. for just uploading snapshots there etc, is
>>>> reasonable.
>>>>
>>>> GCP
>>>>
>>>>
>>>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket
>>>>
>>>> Azure (this one is quite sophisticated), lot of options ...
>>>>
>>>>
>>>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL
>>>>
>>>> S3, lot of options how to mount that
>>>>
>>>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system
>>>>
>>>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com>
>>>> wrote:
>>>>
>>>>> Assuming everything else is identical, might not matter for S3.
>>>>> However, not every object store has a filesystem mount.
>>>>>
>>>>> Regarding sprawling dependencies, we can always make the provider
>>>>> specific libraries available as a separate download and put them on their
>>>>> own thread with a separate class path. I think in JVM dtest does this
>>>>> already.  Someone just started asking about IAM for login, it sounds like 
>>>>> a
>>>>> similar problem.
>>>>>
>>>>>
>>>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote:
>>>>>
>>>>>> I think another way of saying what Stefan may be getting at is what
>>>>>> does a library give us that an appropriately configured mount dir 
>>>>>> doesn’t?
>>>>>>
>>>>>> We don’t want to treat S3 the same as local disk, but this can be
>>>>>> achieved easily with config. Is there some other benefit of direct
>>>>>> integration? Well defined exceptions if we need to distinguish cases is 
>>>>>> one
>>>>>> that maybe springs to mind but perhaps there are others?
>>>>>>
>>>>>>
>>>>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> 
>>>>>>
>>>>>> That is cool but this still does not show / explain how it would look
>>>>>> like when it comes to dependencies needed for actually talking to 
>>>>>> storages
>>>>>> like s3.
>>>>>>
>>>>>> Maybe I am missing something here and please explain when I am
>>>>>> mistaken but If I understand that correctly, for talking to s3 we would
>>>>>> need to use a library like this, right? (1). So that would be added among
>>>>>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? 
>>>>>> Why
>>>>>> s3? Every time somebody comes up with a new remote storage support, that
>>>>>> would be added to classpath as well? How are these dependencies going to
>>>>>> play with each other and with Cassandra in general? Will all these 
>>>>>> storage
>>>>>> provider libraries for arbitrary clouds be even compatible with Cassandra
>>>>>> licence-wise?
>>>>>>
>>>>>> I am sorry I keep repeating these questions but this part of that I
>>>>>> just don't get at all.
>>>>>>
>>>>>> We can indeed add an API for this, sure sure, why not. But for people
>>>>>> who do not want to deal with this at all and just be OK with a FS 
>>>>>> mounted,
>>>>>> why would we block them doing that?
>>>>>>
>>>>>> (1)
>>>>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>>>>>>
>>>>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>>    .
>>>>>>>
>>>>>>>
>>>>>>> It’s not an area where I can currently dedicate engineering effort.
>>>>>>>> But if others are interested in contributing a feature like this, I’d 
>>>>>>>> see
>>>>>>>> it as valuable for the project and would be happy to collaborate on
>>>>>>>> design/architecture/goals.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jake mentioned 17 months ago a custom FileSystemProvider we could
>>>>>>> offer.
>>>>>>>
>>>>>>> None of us at DataStax has gotten around to providing that, but to
>>>>>>> quickly throw something over the wall this is it:
>>>>>>>
>>>>>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>>>>>>>
>>>>>>>   (with a few friend classes under o.a.c.io.util)
>>>>>>>
>>>>>>> We then have a RemoteStorageProvider, private in another repo, that
>>>>>>> implements that and also provides the RemoteFileSystemProvider that Jake
>>>>>>> refers to.
>>>>>>>
>>>>>>> Hopefully that's a start to get people thinking about CEP level
>>>>>>> details, while we get a cleaned abstract of RemoteStorageProvider and
>>>>>>> friends to offer.
>>>>>>>
>>>>>>>

Reply via email to