I have concerns with the majority of this being in the sidecar and not in the database itself. I think it would make sense for the server side of this to be a new service exposed by the database, not in the sidecar. That way it can be able to properly integrate with the authentication and authorization apis, and to make it a first class citizen in terms of having unit/integration tests in the main DB ensuring no one breaks it.
-Jeremiah > On Mar 24, 2023, at 10:29 AM, Dinesh Joshi <djo...@apache.org> wrote: > > Hi Benjamin, > > I agree with your concern about long term maintenance of the code. Doug > has contributed several patches to Cassandra over the years. Besides him > there will be several other maintainers that will take on maintenance of > this code including Yifan and myself. Given how closely it is coupled > with the Cassandra Sidecar project, I would prefer that we keep this > within the Cassandra project umbrella as a separate repository and a > sub-project. > > Thanks, > > Dinesh > > > On 3/24/23 02:35, Benjamin Lerer wrote: >> Hi Doug, >> >> Outside of the changes to the Cassandra Sidecar that are mentioned, what >> the CEP proposes is the donation of a library for Spark integration. It >> seems to me that this library could be offered as an open source project >> outside of the Cassandra project itself. If we accept Spark Bulk >> Analytic as part of the Cassandra project it means that the community >> will commit to maintain it and ensure that for each Cassandra release it >> will be fully compatible. Considering our history with Hadoop >> integration which has basically been unmaintained for years, I am not >> convinced that it is what we should do. >> We only started to expand the scope of the project recently and I would >> personally prefer that we do that slowly starting with the drivers that >> are critical for C*. Now, it is only my personal opinion and other >> people might have a different view on those things. >> >> Le jeu. 23 mars 2023 à 23:29, Miklosovic, Stefan >> <stefan.mikloso...@netapp.com <mailto:stefan.mikloso...@netapp.com> >> <mailto:stefan.mikloso...@netapp.com>> a >> écrit : >> >> Hi, >> >> I think this might be a great contribution in the light of removed >> Hadoop integration recently (CASSANDRA-18323) as it will not be in >> 5.0 anymore. If this CEP is adopted and delivered, I can see how it >> might be a logical replacement of that. >> >> Regards >> >> ________________________________________ >> From: Doug Rohrer <droh...@apple.com <mailto:droh...@apple.com> >> <mailto:droh...@apple.com>> >> Sent: Thursday, March 23, 2023 18:33 >> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> >> <mailto:dev@cassandra.apache.org> >> Cc: James Berragan >> Subject: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with >> Spark Bulk Analytics >> >> NetApp Security WARNING: This is an external email. Do not click >> links or open attachments unless you recognize the sender and know >> the content is safe. >> >> >> >> >> Hi everyone, >> >> Wiki: >> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics> >> >> We’d like to propose this CEP for adoption by the community. >> >> It is common for teams using Cassandra to find themselves looking >> for a way to interact with large amounts of data for analytics >> workloads. However, Cassandra’s standard APIs aren’t designed for >> large scale data egress/ingest as the native read/write paths >> weren’t designed for bulk analytics. >> >> We’re proposing this CEP for this exact purpose. It enables the >> implementation of custom Spark (or similar) applications that can >> either read or write large amounts of Cassandra data at line rates, >> by accessing the persistent storage of nodes in the cluster via the >> Cassandra Sidecar. >> >> This CEP proposes new APIs in the Cassandra Sidecar and a companion >> library that allows deep integration into Apache Spark that allows >> its users to bulk import or export data from a running Cassandra >> cluster with minimal to no impact to the read/write traffic. >> >> We will shortly publish a branch with code that will accompany this >> CEP to help readers understand it better. >> >> As a reminder, please keep the discussion here on the dev list vs. >> in the wiki, as we’ve found it easier to manage via email. >> >> Sincerely, >> >> Doug Rohrer & James Berragan