On 3/6/2025 7:16 AM, Jon Haddad wrote:
Assuming everything else is identical, might not matter for S3.
However, not every object store has a filesystem mount.
Regarding sprawling dependencies, we can always make the provider
specific libraries available as a separate download and put them on
their own thread with a separate class path. I think in JVM dtest does
this already. Someone just started asking about IAM for login, it
sounds like a similar problem.
That was me. :-) Cassandra's auth already has fairly well defined
interfaces and a plug-in mechanism, so it's easy to vend alternative
auth solutions without polluting the main project's dependency graph, at
build-time anyway. A similar approach could be beneficial for CEP-36,
particularly (IMO) for cold-storage purposes. I suspect decoupling
pluggable alternate channel proxies for cold storage from configurable
alternate channel proxies for redirecting data locally to free up space,
migrate to a different storage device, etc., would make both easier. The
CEP seems to be trying to do both, but they smell like pretty different
goals to me.
Thanks -- Joel.
On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote:
I think another way of saying what Stefan may be getting at is
what does a library give us that an appropriately configured mount
dir doesn’t?
We don’t want to treat S3 the same as local disk, but this can be
achieved easily with config. Is there some other benefit of direct
integration? Well defined exceptions if we need to distinguish
cases is one that maybe springs to mind but perhaps there are others?
On 6 Mar 2025, at 08:39, Štefan Miklošovič
<smikloso...@apache.org> wrote:
That is cool but this still does not show / explain how it would
look like when it comes to dependencies needed for actually
talking to storages like s3.
Maybe I am missing something here and please explain when I am
mistaken but If I understand that correctly, for talking to s3 we
would need to use a library like this, right? (1). So that would
be added among Cassandra dependencies? Hence Cassandra starts to
be biased against s3? Why s3? Every time somebody comes up with a
new remote storage support, that would be added to classpath as
well? How are these dependencies going to play with each other
and with Cassandra in general? Will all these storage
provider libraries for arbitrary clouds be even compatible with
Cassandra licence-wise?
I am sorry I keep repeating these questions but this part of that
I just don't get at all.
We can indeed add an API for this, sure sure, why not. But for
people who do not want to deal with this at all and just be OK
with a FS mounted, why would we block them doing that?
(1)
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org>
wrote:
.
It’s not an area where I can currently dedicate
engineering effort. But if others are interested in
contributing a feature like this, I’d see it as valuable
for the project and would be happy to collaborate on
design/architecture/goals.
Jake mentioned 17 months ago a custom FileSystemProvider we
could offer.
None of us at DataStax has gotten around to providing that,
but to quickly throw something over the wall this is it:
https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
(with a few friend classes under o.a.c.io.util)
We then have a RemoteStorageProvider, private in another
repo, that implements that and also provides the
RemoteFileSystemProvider that Jake refers to.
Hopefully that's a start to get people thinking about CEP
level details, while we get a cleaned abstract of
RemoteStorageProvider and friends to offer.