RE: Accessing Cassandra data from Spark Shell

2016-05-10 Thread Mohammed Guller
Yes, it is very simple to access Cassandra data using Spark shell. Step 1: Launch the spark-shell with the spark-cassandra-connector package $SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0 Step 2: Create a DataFrame pointing to your Cassandra table

RE: Accessing Cassandra data from Spark Shell

2016-05-18 Thread Mohammed Guller
en.sla...@instaclustr.com] Sent: Tuesday, May 17, 2016 11:00 PM To: user@cassandra.apache.org; Mohammed Guller Cc: user Subject: Re: Accessing Cassandra data from Spark Shell It definitely should be possible for 1.5.2 (I have used it with spark-shell and cassandra connector with 1.4.x). The main trick

RE: Cassandra Summit 2015 Roll Call!

2015-09-22 Thread Mohammed Guller
Hey everyone, I will be at the summit too on Wed and Thu. I am giving a talk on Thursday at 2.40pm. Would love to meet everyone on this list in person. Here is an old picture of mine: https://events.mfactormeetings.com/accounts/register123/mfactor/datastax/events/dstaxsummit2015/guller.jpg Mo

RE: reducing disk space consumption

2016-02-10 Thread Mohammed Guller
If I remember it correctly, C* creates a snapshot when you drop a keyspace. Run the following command to get rid of the snapshot: nodetool clearsnapshot Mohammed Author: Big Data Analytics with Spark From: Ted Yu [mail

RE: Number of columns per row for composite columns?

2014-08-13 Thread Mohammed Guller
4 Mohammed From: hlqv [mailto:hlqvu...@gmail.com] Sent: Tuesday, August 12, 2014 11:44 PM To: user@cassandra.apache.org Subject: Re: Number of columns per row for composite columns? For more specifically, I declared a column family create column family Column_Family with key_validation

no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
Hi - We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances were using EBS for storage (I know it is not recommended). We replaced the EBS storage with SSDs. However, we didn't see any change in read latency. A query that took 10 seconds when data was stored on EBS still t

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
...@eventbrite.com] Sent: Tuesday, September 16, 2014 5:42 PM To: user@cassandra.apache.org Subject: Re: no change observed in read latency after switching from EBS to SSD storage On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Does anyone have insight as to why we

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-17 Thread Mohammed Guller
EBS HDD drives to EBS SSD drives? Or instance SSD drives? The m3.large only comes with 32GB of instance based SSD storage. If you're using EBS SSD drives then network will still be the slowest thing so switching won't likely make much of a difference. On Wed, Sep 17, 2014 at 6:00 AM,

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-17 Thread Mohammed Guller
well (can check partition max size from output of nodetool cfstats), may be worth including g to break it up more - but I dont know enough about your data model. --- Chris Lohfink On Sep 17, 2014, at 4:53 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Thank you all for

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-18 Thread Mohammed Guller
we are serving more queries at once than there are cores (in general Cassandra is not designed to serve workloads consisting of single large queries, at least not yet) On Thu, Sep 18, 2014 at 7:29 AM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Chris, I agree that reading 250k

RE: What will be system configuration for retrieving few "GB" of data

2014-10-17 Thread Mohammed Guller
With 8GB RAM, the default heap size is 2GB, so you will quickly start running out of heap space if you do large reads. What is a large read? It depends on the number of columns in each row and data in each column. It could 100,000 rows for some and 300,000 for others. In addition, remember that

querying data from Cassandra through the Spark SQL Thrift JDBC server

2014-11-19 Thread Mohammed Guller
Hi - I was curious if anyone is using the Spark SQL Thrift JDBC server with Cassandra. It would be great be if you could share how you got it working? For example, what config changes have to be done in hive-site.xml, what additional jars are required, etc.? I have a Spark app that can programm

batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
Hi - The cassandra.yaml file has property called batch_size_warn_threshold_in_kb. The default size is 5kb and according to the comments in the yaml file, it is used to log WARN on any batch size exceeding this value in kilobytes. It says caution should be taken on increasing the size of this thre

RE: batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
people confuse the BATCH keyword as a performance optimization, this helps flag those cases of misuse. On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Hi – The cassandra.yaml file has property called batch_size_warn_threshold_in_kb. The default size is 5k

C* throws OOM error despite use of automatic paging

2015-01-08 Thread Mohammed Guller
Hi - We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax's Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression th

RE: C* throws OOM error despite use of automatic paging

2015-01-10 Thread Mohammed Guller
Sent: Friday, January 9, 2015 4:02 AM To: user@cassandra.apache.org Subject: Re: C* throws OOM error despite use of automatic paging Hi Mohammed, Zitat von Mohammed Guller : > Hi - > > We have an ETL application that reads all rows from Cassandra (2.1.2), > filters them and stores a s

RE: C* throws OOM error despite use of automatic paging

2015-01-10 Thread Mohammed Guller
data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ? On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Hi – We have an ETL application that reads all rows from Cassandra (2.1.

RE: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Mohammed Guller
it as 'not happening'. What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes? On Sat, Jan 10, 2015 at 1:48 PM, Mohammed

Re: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Mohammed Guller
sh in cassandra-env.sh just uncomment JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError" and then run your query again. The heapdump will have the answer. On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: The heap usage is pretty low (

RE: Retrieving all row keys of a CF

2015-01-16 Thread Mohammed Guller
Ruchir, I am curious if you had better luck with the AllRowsReader recipe. Mohammed From: Eric Stevens [mailto:migh...@gmail.com] Sent: Friday, January 16, 2015 12:33 PM To: user@cassandra.apache.org Subject: Re: Retrieving all row keys of a CF Note that getAllRows() is deprecated in Astyanax (s

RE: Retrieving all row keys of a CF

2015-01-16 Thread Mohammed Guller
A few questions: 1) What is the heap size and total memory on each node? 2) How big is the cluster? 3) What are the read and range timeouts (in cassandra.yaml) on the C* nodes? 4) What are the timeouts for the Astyanax client? 5) Do you see GC pressure on the C* node

RE: Retrieving all row keys of a CF

2015-01-16 Thread Mohammed Guller
r new gen and old gen take? occurs every 5 secs dont see huge gc pressure, <50ms 6)Does any node crash with OOM error when you try AllRowsReader? No From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Friday, January 16, 2015 7:30 PM To: user@cassandra.apache.org<

RE: sharding vs what cassandra does

2015-01-19 Thread Mohammed Guller
Partitioning is similar to sharding. Mohammed From: Adaryl "Bob" Wakefield, MBA [mailto:adaryl.wakefi...@hotmail.com] Sent: Monday, January 19, 2015 8:28 PM To: user@cassandra.apache.org Subject: sharding vs what cassandra does It’s my understanding that the way Cassandra replicates data across

RE: Retrieving all row keys of a CF

2015-01-22 Thread Mohammed Guller
sing the cluster to track your checkpoints, but some other data store (maybe just a flatfile). Again, this is just to give you a sense of what's involved. On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Both total system memory and heap size can’t b

RE: Retrieving all row keys of a CF

2015-01-23 Thread Mohammed Guller
row keys of a CF In each partition cql rows on average is 200K. Max is 3M. 800K is number of cassandra partitions. From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Thursday, January 22, 2015 7:43 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subje

RE: Controlling the MAX SIZE of sstables after compaction

2015-01-27 Thread Mohammed Guller
I believe Aegisthus is open sourced. Mohammed From: Jan [mailto:cne...@yahoo.com] Sent: Monday, January 26, 2015 11:20 AM To: user@cassandra.apache.org Subject: Re: Controlling the MAX SIZE of sstables after compaction Parth et al; the folks at Netflix seem to have built a solution for your pr

full-tabe scan - extracting all data from C*

2015-01-27 Thread Mohammed Guller
Hi - Over the last few weeks, I have seen several emails on this mailing list from people trying to extract all data from C*, so that they can import that data into other analytical tools that provide much richer analytics functionality than C*. Extracting all data from C* is a full-table scan,

RE: Re:full-tabe scan - extracting all data from C*

2015-01-27 Thread Mohammed Guller
and Spark sc.cassandraTable() work well. I use both of them frequently. At 2015-01-28 04:06:20, "Mohammed Guller" mailto:moham...@glassbeam.com>> wrote: Hi - Over the last few weeks, I have seen several emails on this mailing list from people trying to extract all data from C*, so that

RE: Tombstone gc after gc grace seconds

2015-01-29 Thread Mohammed Guller
Ravi – It may help. What version are you running? Do you know if minor compaction is getting triggered at all? One way to check would be see how many sstables the data directory has. Mohammed From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] Sent: Thursday, January 29, 2015 1:29 PM To:

RE: Smart column searching for a particular rowKey

2015-02-03 Thread Mohammed Guller
Astyanax allows you to execute CQL statements. I don’t remember the details, but it is there. One tip – when you create the column family, use CLUSTERING ORDER WITH (timestamp DESC). Then you query becomes straightforward and C* will do all the heavy lifting for you. Mohammed From: Ravi Agraw

RE: Data tiered compaction and data model question

2015-02-18 Thread Mohammed Guller
What is the maximum number of events that you expect in a day? What is the worst-case scenario? Mohammed From: cass savy [mailto:casss...@gmail.com] Sent: Wednesday, February 18, 2015 4:21 PM To: user@cassandra.apache.org Subject: Data tiered compaction and data model question We want to track

RE: Data tiered compaction and data model question

2015-02-19 Thread Mohammed Guller
if you need C* depending on event size. On Thu, Feb 19, 2015 at 12:00 AM, cass savy mailto:casss...@gmail.com>> wrote: 10-20 per minute is the average. Worstcase can be 10x of avg. On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: What is

Spark SQL Thrift JDBC/ODBC server + Cassandra

2015-04-07 Thread Mohammed Guller
Hi - Is anybody using Cassandra with the Spark SQL Thrift JDBC/ODBC server? I can programmatically (within our app) use Spark SQL with C* using the Spark-Cassandra-Connector, but can't find any documentation on how to query C* through the Spark SQL Thrift JDBC/ODBC server. Would appreciate if y

Spark SQL JDBC Server + DSE

2015-05-26 Thread Mohammed Guller
Hi - As I understand, the Spark SQL Thrift/JDBC server cannot be used with the open source C*. Only DSE supports the Spark SQL JDBC server. We would like to find out whether how many organizations are using this combination. If you do use DSE + Spark SQL JDBC server, it would be great if you c

RE: Spark SQL JDBC Server + DSE

2015-05-28 Thread Mohammed Guller
Anybody out there using DSE + Spark SQL JDBC server? Mohammed From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Tuesday, May 26, 2015 6:17 PM To: user@cassandra.apache.org Subject: Spark SQL JDBC Server + DSE Hi - As I understand, the Spark SQL Thrift/JDBC server cannot be used with

RE: Spark SQL JDBC Server + DSE

2015-05-29 Thread Mohammed Guller
y action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Mohammed Guller mailto:moham...@glassbeam.com>> Reply-To: mailto:user@cassandra.apache.org>> Date: Thursday, May 28, 2015 at 8:26 PM To: "user@cassa

RE: Spark SQL JDBC Server + DSE

2015-06-01 Thread Mohammed Guller
mation by persons or entities other than the intended recipient is strictly prohibited. From: Mohammed Guller mailto:moham...@glassbeam.com>> Reply-To: mailto:user@cassandra.apache.org>> Date: Friday, May 29, 2015 at 2:15 PM To: "user@cassandra.apache.org<mailto:user@cassandra.

RE: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Mohammed Guller
Considering that 2.1.6 was just released and it is the first “stable” release ready for production in the 2.1 series, won’t it be too soon to EOL 2.1.x when 3.0 comes out in September? Mohammed From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, June 11, 2015 10:14 AM To: user Subje

RE: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Mohammed Guller
in 2.2).. so 2.2.x and 2.1.x are somewhat synonymous. On Jun 11, 2015, at 8:14 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Considering that 2.1.6 was just released and it is the first “stable” release ready for production in the 2.1 series, won’t it be too soon to EOL 2.1.

RE: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Mohammed Guller
The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed From: Andres de la Peña [mailto:adelap...@stratio.com] Sent: Friday, June 12, 2015 3:43 AM To: user@cassandra.apache.org Subject: Re: Lucene index plugin for Apache Cassandra I

RE: Code review - Spark SQL command-line client for Cassandra

2015-06-19 Thread Mohammed Guller
Hi Matthew, It looks fine to me. I have built a similar service that allows a user to submit a query from a browser and returns the result in JSON format. Another alternative is to leave a Spark shell or one of the notebooks (Spark Notebook, Zeppelin, etc.) session open and run queries from ther

RE: select many rows one time or select many times?

2014-08-01 Thread Mohammed Guller
Did you benchmark these two options: 1) Select with IN 2) Select all words and filter in application Mohammed From: Philo Yang [mailto:ud1...@gmail.com] Sent: Thursday, July 31, 2014 10:45 AM To: user@cassandra.apache.org Subject: select many rows one time or select many times? Hi al

Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Mohammed Guller
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data usin

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-24 Thread Mohammed Guller
good advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of t

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-25 Thread Mohammed Guller
lication in your keyspace and what consistency you are reading with. Also 55MB on disk will not mean 55MB in memory. The data is compressed on disk and also there are other overheads. On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: No deletes. In my t

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-30 Thread Mohammed Guller
--- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 26/06/2013, at 3:57 PM, Mohammed Guller mailto:moham...@glassbeam.com>> wrote: Replication is 3 and read consistency level is one. One of the non-cordinator mode is crashing, so the O