Jon,

all "big three" support mounting a bucket locally. That being said, I do
not think that completely ditching this possibility for Cassandra working
with a mount, e.g. for just uploading snapshots there etc, is reasonable.

GCP

https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket

Azure (this one is quite sophisticated), lot of options ...

https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL

S3, lot of options how to mount that

https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system

On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad <j...@rustyrazorblade.com> wrote:

> Assuming everything else is identical, might not matter for S3. However,
> not every object store has a filesystem mount.
>
> Regarding sprawling dependencies, we can always make the provider specific
> libraries available as a separate download and put them on their own thread
> with a separate class path. I think in JVM dtest does this already.
> Someone just started asking about IAM for login, it sounds like a similar
> problem.
>
>
> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote:
>
>> I think another way of saying what Stefan may be getting at is what does
>> a library give us that an appropriately configured mount dir doesn’t?
>>
>> We don’t want to treat S3 the same as local disk, but this can be
>> achieved easily with config. Is there some other benefit of direct
>> integration? Well defined exceptions if we need to distinguish cases is one
>> that maybe springs to mind but perhaps there are others?
>>
>>
>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>> 
>>
>> That is cool but this still does not show / explain how it would look
>> like when it comes to dependencies needed for actually talking to storages
>> like s3.
>>
>> Maybe I am missing something here and please explain when I am mistaken
>> but If I understand that correctly, for talking to s3 we would need to use
>> a library like this, right? (1). So that would be added among Cassandra
>> dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every
>> time somebody comes up with a new remote storage support, that would be
>> added to classpath as well? How are these dependencies going to play with
>> each other and with Cassandra in general? Will all these storage
>> provider libraries for arbitrary clouds be even compatible with Cassandra
>> licence-wise?
>>
>> I am sorry I keep repeating these questions but this part of that I just
>> don't get at all.
>>
>> We can indeed add an API for this, sure sure, why not. But for people who
>> do not want to deal with this at all and just be OK with a FS mounted, why
>> would we block them doing that?
>>
>> (1)
>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>>
>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> wrote:
>>
>>>    .
>>>
>>>
>>> It’s not an area where I can currently dedicate engineering effort. But
>>>> if others are interested in contributing a feature like this, I’d see it as
>>>> valuable for the project and would be happy to collaborate on
>>>> design/architecture/goals.
>>>>
>>>
>>>
>>> Jake mentioned 17 months ago a custom FileSystemProvider we could offer.
>>>
>>> None of us at DataStax has gotten around to providing that, but to
>>> quickly throw something over the wall this is it:
>>>
>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>>>
>>>   (with a few friend classes under o.a.c.io.util)
>>>
>>> We then have a RemoteStorageProvider, private in another repo, that
>>> implements that and also provides the RemoteFileSystemProvider that Jake
>>> refers to.
>>>
>>> Hopefully that's a start to get people thinking about CEP level details,
>>> while we get a cleaned abstract of RemoteStorageProvider and friends to
>>> offer.
>>>
>>>

Reply via email to