Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

Dinesh Joshi Fri, 24 Mar 2023 08:30:05 -0700

Hi Benjamin,

I agree with your concern about long term maintenance of the code. Doug
has contributed several patches to Cassandra over the years. Besides him
there will be several other maintainers that will take on maintenance of
this code including Yifan and myself. Given how closely it is coupled
with the Cassandra Sidecar project, I would prefer that we keep this
within the Cassandra project umbrella as a separate repository and a
sub-project.


Thanks,

Dinesh


On 3/24/23 02:35, Benjamin Lerer wrote:
> Hi Doug,
> 
> Outside of the changes to the Cassandra Sidecar that are mentioned, what
> the CEP proposes is the donation of a library for Spark integration. It
> seems to me that this library could be offered as an open source project
> outside of the Cassandra project itself. If we accept Spark Bulk
> Analytic as part of the Cassandra project it means that the community
> will commit to maintain it and ensure that for each Cassandra release it
> will be fully compatible. Considering our history with Hadoop
> integration which has basically been unmaintained for years, I am not
> convinced that it is what we should do.
> We only started to expand the scope of the project recently and I would
> personally prefer that we do that slowly starting with the drivers that
> are critical for C*. Now, it is only my personal opinion and other
> people might have a different view on those things.
> 
> Le jeu. 23 mars 2023 à 23:29, Miklosovic, Stefan
> <[email protected] <mailto:[email protected]>> a
> écrit :
> 
>     Hi,
> 
>     I think this might be a great contribution in the light of removed
>     Hadoop integration recently (CASSANDRA-18323) as it will not be in
>     5.0 anymore. If this CEP is adopted and delivered, I can see how it
>     might be a logical replacement of that.
> 
>     Regards
> 
>     ________________________________________
>     From: Doug Rohrer <[email protected] <mailto:[email protected]>>
>     Sent: Thursday, March 23, 2023 18:33
>     To: [email protected] <mailto:[email protected]>
>     Cc: James Berragan
>     Subject: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with
>     Spark Bulk Analytics
> 
>     NetApp Security WARNING: This is an external email. Do not click
>     links or open attachments unless you recognize the sender and know
>     the content is safe.
> 
> 
> 
> 
>     Hi everyone,
> 
>     Wiki:
>     
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
>  
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics>
> 
>     We’d like to propose this CEP for adoption by the community.
> 
>     It is common for teams using Cassandra to find themselves looking
>     for a way to interact with large amounts of data for analytics
>     workloads. However, Cassandra’s standard APIs aren’t designed for
>     large scale data egress/ingest as the native read/write paths
>     weren’t designed for bulk analytics.
> 
>     We’re proposing this CEP for this exact purpose. It enables the
>     implementation of custom Spark (or similar) applications that can
>     either read or write large amounts of Cassandra data at line rates,
>     by accessing the persistent storage of nodes in the cluster via the
>     Cassandra Sidecar.
> 
>     This CEP proposes new APIs in the Cassandra Sidecar and a companion
>     library that allows deep integration into Apache Spark that allows
>     its users to bulk import or export data from a running Cassandra
>     cluster with minimal to no impact to the read/write traffic.
> 
>     We will shortly publish a branch with code that will accompany this
>     CEP to help readers understand it better.
> 
>     As a reminder, please keep the discussion here on the dev list vs.
>     in the wiki, as we’ve found it easier to manage via email.
> 
>     Sincerely,
> 
>     Doug Rohrer & James Berragan
>

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

Reply via email to