RE: Accessing Cassandra data from Spark Shell

Mohammed Guller Tue, 10 May 2016 18:49:27 -0700

Yes, it is very simple to access Cassandra data using Spark shell.

Step 1: Launch the spark-shell with the spark-cassandra-connector package
$SPARK_HOME/bin/spark-shell --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.5.0


Step 2: Create a DataFrame pointing to your Cassandra table
val dfCassTable = sqlContext.read
                                                         
.format("org.apache.spark.sql.cassandra")
                                                         .options(Map( "table" 
-> "your_column_family", "keyspace" -> "your_keyspace"))
                                                         .load()

From this point onward, you have complete access to the DataFrame API. You can 
even register it as a temporary table, if you would prefer to use SQL/HiveQL.

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Ben Slater [mailto:ben.sla...@instaclustr.com]
Sent: Monday, May 9, 2016 9:28 PM
To: user@cassandra.apache.org; user
Subject: Re: Accessing Cassandra data from Spark Shell

You can use SparkShell to access Cassandra via the Spark Cassandra connector. 
The getting started article on our support page will probably give you a good 
steer to get started even if you’re not using Instaclustr: 
https://support.instaclustr.com/hc/en-us/articles/213097877-Getting-Started-with-Instaclustr-Spark-Cassandra-

Cheers
Ben

On Tue, 10 May 2016 at 14:08 Cassa L 
<lcas...@gmail.com<mailto:lcas...@gmail.com>> wrote:
Hi,
Has anyone tried accessing Cassandra data using SparkShell? How do you do it? 
Can you use HiveContext for Cassandra data? I'm using community version of 
Cassandra-3.0

Thanks,
LCassa
--
————————
Ben Slater
Chief Product Officer, Instaclustr
+61 437 929 798

RE: Accessing Cassandra data from Spark Shell

Reply via email to