Re: Performance problem with large wide row inserts using CQL

2014-02-22 Thread Rüdiger Klaehn
On Fri, Feb 21, 2014 at 11:51 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn rkla...@gmail.comwrote:

 Hi Sylvain,

 I applied the patch to the cassandra-2.0 branch (this required some
 manual work since I could not figure out which commit it was supposed to
 apply for, and it did not apply to the head of cassandra-2.0).


 Yeah, some commit yesterday made the patch not apply cleanly anymore. In
 any case, It's not committed to the cassandra-2.0 branch and will be part
 of 2.0.6.


 The benchmark now runs in pretty much identical time to the thrift based
 benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!


 Glad that it helped.


Thanks for the quick fix. I was really starting to get irritated when the
people at SO basically told me that there is something wrong in my code.



 I still have some questions regarding the mapping. Please bear with me if
 these are stupid questions. I am quite new to Cassandra.

 The basic cassandra data model for a keyspace is something like this,
 right?

 SortedMapbyte[], SortedMapbyte[], PairLong, byte[]
  ^ row key. determines which server(s) the rest is stored
 on
  ^ column key
^
 timestamp (latest one wins)
 ^
 value (can be size 0)


 It's a reasonable way to think of how things are stored internally, yes.
 Though as DuyHai mentioned, the first map is really sorting by token and in
 general that means you use mostly the sorting of the second map concretely.


Yes, understood.

So the first SortedMap is sorted on some kind of hash of the actual key to
make sure the data gets evenly distributed along the nodes? What if my key
is already a good hash: is there a way to use an identity function as a
hash function (in CQL)? I am thinking about some kind of content addressed
storage, where the key is a 20 byte SHA1 hash of the data (like in git).
Obviously this is already a pretty good hash.



 So if I have a table like the one in my benchmark (using blobs)

 CREATE TABLE IF NOT EXISTS test.wide (
   time blob,
   name blob,
   value blob,
   PRIMARY KEY (time,name))
   WITH COMPACT STORAGE

 From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
 that

 - time maps to the row key and name maps to the column key without any
 overhead
 - value directly maps to value in the model above without any prefix

 is that correct, or is there some overhead involved in CQL over the raw
 model as described above? If so, where exactly?


 That's correct.
 For completeness sake, if you were to remove the COMPACT STORAGE, there
 would be some overhead in how it maps to the underlying column key, but
 that overhead would buy you much more flexibility in how you could evolve
 this table schema (you could add more CQL columns later if needs be, have
 collections or have static columns following CASSANDRA-6561 that comes in
 2.0.6; none of which you can have with COMPACT STORAGE). Note that it's
 perfectly fine to use COMPACT STORAGE if you know you don't and won't need
 the additional flexibility, but I generally advise people to actually check
 first that using COMPACT STORAGE does make a concrete and meaningful
 difference for their use case (be careful with premature optimization
 really).


In this case I am confident that the schema will not change. But there will
be other tables built from the same data where I am not going to use
compact storage.

cheers,

Rüdiger


Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi,

I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
No Data Loss
Better Performance.

Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
ZooKeeper - Not advised for Queue system.
TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.


Will be great to get insights on this.


Regards,
Jagan



Re: Disabling opscenter data collection in Datastax community 2.0

2014-02-22 Thread Michael Shuler

On 02/22/2014 06:12 AM, user 01 wrote:

I'm using dsc20 (datastax community edition for cassandra 2.0) in
production environment. But since I am not authorized to use Opscenter
for production use. So how do I disable the data recording that is being
done for opscenter consumption, as this is just a unusable for me  will
put unnecessary load on my machine ?


The agent is lightweight.  How are you planning to monitor your 
production env?


You didn't hint at how you installed DSC/OpsCenter, but stop the agents, 
uninstall the agents, and drop the keyspace.  The details of those steps 
depend on how you installed (rpm, deb, tar).


http://www.datastax.com/documentation/opscenter/4.0/opsc/reference/opscInstallLocations_g.html
http://www.datastax.com/documentation/opscenter/4.0/opsc/online_help/opscRemovingPackages_t.html

Those docs might be helpful.  Let us know how you installed DSC, if you 
need some better details.


--
Kind regards,
Michael


Re: Queuing System

2014-02-22 Thread DuyHai Doan
Jagan

 Queue-like data structures are known to be one of the worst anti patterns
for Cassandra:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets



On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan



Re: Queuing System

2014-02-22 Thread Laing, Michael
We use RabbitMQ for queuing and Cassandra for persistence.

RabbitMQ with clustering and/or federation should meet your high
availability needs.

Michael


On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Jagan

  Queue-like data structures are known to be one of the worst anti patterns
 for Cassandra:
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets



 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be
performing better. With replication requirement, I probably have to look 
 at
Apache ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan





Re: Queuing System

2014-02-22 Thread Tupshin Harper
While, historically, it has been true that queuing in Cassandra has been an
anti-pattern, it is also true that Leveled Compaction addresses the worst
aspect of frequent deletes in Cassandra, and that overall, queuing in
Cassandra is nowhere near the anti-pattern that it used to be. This is
something that I've been meaning to write about more extensively.

 If your requirements are more around availability (particularly multi-dc)
and relability with moderate (not extreme) performance, it is quite
possible to build a pretty decent system on top of Cassandra. You don't
mention your throughput requirements, nor additional semantics that might
be necessary (e.g. deliver at-least-once vs deliver exactly once), but
Cassandra 2.0's lightweight transactions provide a CAS primitive that can
be used to ensure deliver-once if that is a requirement.

I'd be happy to continue discussing appropriate data-models and access
patterns if you decide to go down this path.

-Tupshin


On Sat, Feb 22, 2014 at 10:03 AM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan



Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi Michael,

Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
will give better performance if writing directly into Rabbit with Ack support 
Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit.


Complexities involving - Handling scenarios like Rabbit Connection failure etc 
Vs Cassandra write performance and replication with hinted handoff support etc, 
makes me wonder if this is a better path.


Regards,
Jagan

 On Sat, 22 Feb 2014 21:01:14 +0530  Michael Laing 
lt;michael.la...@nytimes.comgt; wrote  


 We use RabbitMQ for queuing and Cassandra for persistence. 

 RabbitMQ with clustering and/or federation should meet your high availability 
needs.
  

 Michael

  
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan lt;doanduy...@gmail.comgt; 
wrote:
   Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  



 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 




 


 




Re: Queuing System

2014-02-22 Thread Joe Stein
If performance and availability for messaging is a requirement then use Apache 
Kafka http://kafka.apache.org/

You can pass the same thrift/avro objects through the Kafka commit log or 
strings or whatever you want.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote:

 Hi Michael,
 
 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
 will give better performance if writing directly into Rabbit with Ack support 
 Vs a temporary Queue in Cassandra first and then dequeue and publish in 
 Rabbit.
 
 Complexities involving - Handling scenarios like Rabbit Connection failure 
 etc Vs Cassandra write performance and replication with hinted handoff 
 support etc, makes me wonder if this is a better path.
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing 
 michael.la...@nytimes.com wrote  
 
 We use RabbitMQ for queuing and Cassandra for persistence.
 
 RabbitMQ with clustering and/or federation should meet your high availability 
 needs.
 
 Michael
 
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Jagan 
 
  Queue-like data structures are known to be one of the worst anti 
   patterns for Cassandra:  
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:
 Hi,
 
 I need to decouple some of the work being processed from the user thread to 
 provide better user experience. For that I need a queuing system with the 
 following needs,
 High Availability
 No Data Loss
 Better Performance.
 Following are some libraries that were considered along with the limitation I 
 see,
 Redis - Data Loss
 ZooKeeper - Not advised for Queue system.
 TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
 With replication requirement, I probably have to look at Apache 
 ActiveMQ+LevelDB.
 After checking on the third option above, I kind of wonder if Cassandra with 
 Leveled Compaction offer a similar system. Do you see any issues in such a 
 usage or is there other   better solutions available.
 
 Will be great to get insights on this.
 
 Regards,
 Jagan
 
 
 


Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi Joe,

If my understanding is right, Kafka does not satisfy the high 
availability/replication part well because of the need for leader and In-Sync 
replicas. 


Regards,
Jagan

 On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steinlt;crypt...@gmail.comgt; 
wrote  


 If performance and availability for messaging is a requirement then use Apache 
Kafka http://kafka.apache.org/
 
You can pass the same thrift/avro objects through the Kafka commit log or 
strings or whatever you want.
   

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
  


  
On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
 

 
   Hi Michael, 

 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
will give better performance if writing directly into Rabbit with Ack support 
Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit.
  

 Complexities involving - Handling scenarios like Rabbit Connection failure etc 
Vs Cassandra write performance and replication with hinted handoff support etc, 
makes me wonder if this is a better path.
  

 Regards,
 Jagan
  
 On Sat, 22 Feb 2014 21:01:14 +0530  Michael Laing 
lt;michael.la...@nytimes.comgt; wrote  

 
   We use RabbitMQ for queuing and Cassandra for persistence. 

 RabbitMQ with clustering and/or federation should meet your high availability 
needs.
  

 Michael

  
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan lt;doanduy...@gmail.comgt; 
wrote:
   Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  



 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 




 


 




 




Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi,

Thanks for the pointer. 


Following are some options given there,
If you know where your live data begins, hint Cassandra with a start column, to 
reduce the scan times and the amount of tombstones to collect.
 A broker will usually have some notion of what’s next in the sequence and thus 
be able to do much more targeted queries, down to a single record if the 
storage strategy were to choose monotonic sequence numbers.

We need to do is have some intelligence in using the system and avoid 
tombstones either use the pointed Column Name or use proper start column if 
slice query is used.


Is that right or I am missing something here?


Regards,
Jagan

 On Sat, 22 Feb 2014 20:55:39 +0530 DuyHai Doanlt;doanduy...@gmail.comgt; 
wrote  


  Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  


  
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 


 




Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Thanks Tupshin for your assistance. As I mentioned in the other mail, Yes I am 
planning to use RabbitMQ for my messaging system. But I wonder which will give 
better performance if writing directly into Rabbit with Ack support Vs a 
temporary Queue in Cassandra first and then dequeue and publish in Rabbit.I use 
Rabbit for Messaging because of the Routing and Push model communication etc. 
So I am thinking of using Cassandra as a temporary Queue which will give fast 
write performance with no data loss Vs waiting for Rabbit Ack @ application 
level or handling Rabbit re-connection Vs Cassandra hinted handoff writes.

So Cassandra might aggregate all my msg queue temporarily before I publish them 
to Rabbit. Is this fine? If so, please share your insight on which model amp; 
access pattern will be a better fit for this usage. Throughput requirements may 
be around say 100 ops/sec.


Regards,
Jagan


 On Sat, 22 Feb 2014 21:10:36 +0530 Tupshin 
Harperlt;tups...@tupshin.comgt; wrote  


 While, historically, it has been true that queuing in Cassandra has been an 
anti-pattern, it is also true that Leveled Compaction addresses the worst 
aspect of frequent deletes in Cassandra, and that overall, queuing in Cassandra 
is nowhere near the anti-pattern that it used to be. This is something that 
I've been meaning to write about more extensively. 

  If your requirements are more around availability (particularly multi-dc) and 
relability with moderate (not extreme) performance, it is quite possible to 
build a pretty decent system on top of Cassandra. You don't mention your 
throughput requirements, nor additional semantics that might be necessary (e.g. 
deliver at-least-once vs deliver exactly once), but Cassandra 2.0's lightweight 
transactions provide a CAS primitive that can be used to ensure deliver-once if 
that is a requirement.
  

 I'd be happy to continue discussing appropriate data-models and access 
patterns if you decide to go down this path.
  

 -Tupshin
  
 
  On Sat, Feb 22, 2014 at 10:03 AM, Jagan Ranganathan 
lt;ja...@zohocorp.comgt; wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 



 





Re: List support in Net::Async::CassandraCQL ?

2014-02-22 Thread Paul LeoNerd Evans
(resending for the list now I'm subscribed)

On Sat, 22 Feb 2014 14:03:06 +1100
Jacob Rhoden jacob.rho...@me.com wrote:

 This perl library has been extremely useful for scripting up data
 migrations. I wonder if anyone knows of the easiest way to use lists
 with this driver? Throwing a perl array in as a parameter doesn’t
 work as is:
 
 my $q = $cass-prepare(update contact set name=?, address=?
 where uuid=?)-get; push @f, $q-execute([$name, @address, $uuid]);
 Future-needs_all( @f )-get;
 
 Returns the following:
 
 Cannot encode address: not an ARRAY
 at /usr/local/share/perl/5.14.2/Net/Async/CassandraCQL/Query.pm line
 182

It needs to arrive as an ARRAYref:

  $q-execute([$name, \@address, $uuid]);

Or if you'd prefer you can use named bindings:

  $q-execute({name = $name, address = \@address, uuid = $uuid});

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


Re: Queuing System

2014-02-22 Thread DuyHai Doan
Jagan

Few time ago I dealed with a similar queuing design for one customer.

*If you never delete messages in the queue*, then it is possible to use
wide rows with bucketing and increasing monotonic column name to store
messages.

CREATE TABLE *read_only_queue *(
   bucket_number int,
   insertion_time timeuuid,
   message text,
   PRIMARY KEY(bucket_number,insertion_time)
);

 Let's say that you allow only 100 000 messages per partition (physical
row) to avoid too wide rows, then inserting/reading from the table
*read_only_queue
*is easy;

 For message producer :

   1) Start at bucket_number = 1
   2) Insert messages with column name = generated timeUUID with
micro-second precision (depending on whether the insertion rate is high or
not)
   3) If message count = 100 000, increment bucket_number by one and go to
2)

For message reader:

   1) Start at bucket_number = 1
   2) Read messages by slice of *N, *save the *insertion_time *of the last
read message
   3) Use the saved *insertion_time *to perform next slice query
   4) If read messages count = 100 000, increment bucket_number and go to
2). Keep the *insertion_time, *do not reset it since his value is
increasing monotonically

For multiple and concurrent producers  writers, there is a trick. Let's
assume you have *P* concurrent producers and *C* concurrent consumers.

  Assign a numerical ID for each producer and consumer. First producer ID =
1... last producer ID = *P*. Same for consumers.

  - re-use the above algorithm
  - each producer/consumer start at *bucket_number *= his ID
  - at the end of the row,
- next bucket_number = current bucker_number + *P* for producers
- next bucket_number = current bucker_number + *C* for consumers


The last thing to take care of is compaction configuration to reduce the
number of SSTables on disk.

If you achieve to get rid of accumulation effects, e.g reading rate is
faster than writing rate,  the message are likely to be consumed while it's
still in memory (in memtable) at server side. In this particular case, you
can optimize further by deactivating compaction for the table.

Regards

 Duy Hai








On Sat, Feb 22, 2014 at 5:56 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 Thanks for the pointer.

 Following are some options given there,

- If you know where your live data begins, hint Cassandra with a start
column, to reduce the scan times and the amount of tombstones to collect.
-  A broker will usually have some notion of what's next in the
sequence and thus be able to do much more targeted queries, down to a
single record if the storage strategy were to choose monotonic sequence
numbers.

 We need to do is have some intelligence in using the system and avoid
 tombstones either use the pointed Column Name or use proper start column if
 slice query is used.

 Is that right or I am missing something here?

 Regards,
 Jagan

  On Sat, 22 Feb 2014 20:55:39 +0530 *DuyHai Doandoanduy...@gmail.com
 doanduy...@gmail.com* wrote 

  Jagan

   Queue-like data structures are known to be one of the worst anti
 patterns for Cassandra:
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets



 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

  Hi,

  I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan






Re: Queuing System

2014-02-22 Thread thunder stumpges
We use this same setup also and it works great. 
Thunder

- Reply message -
From: Laing, Michael michael.la...@nytimes.com
To: user@cassandra.apache.org
Subject: Queuing System
Date: Sat, Feb 22, 2014 7:31 AM

We use RabbitMQ for queuing and Cassandra for persistence.
RabbitMQ with clustering and/or federation should meet your high availability 
needs.

Michael



On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:

Jagan




 Queue-like data structures are known to be one of the worst anti patterns for 
Cassandra: 
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets






On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:


Hi,
I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,


High AvailabilityNo Data LossBetter Performance.
Following are some libraries that were considered along with the limitation I 
see,


Redis - Data LossZooKeeper - Not advised for Queue 
system.TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing 
better. With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.


After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.



Will be great to get insights on this.

Regards,
Jagan

Re: Queuing System

2014-02-22 Thread thunder stumpges
This seems a bit overkill. We run far more than 100mps (closer to 600) in 
rabbit with very good latency on a 3 node cluster. It has been very reliable as 
well. 
Thunder

- Reply message -
From: Jagan Ranganathan ja...@zohocorp.com
To: user@cassandra.apache.org
Subject: Queuing System
Date: Sat, Feb 22, 2014 9:06 AM

Thanks Tupshin for your assistance. As I mentioned in the other mail, Yes I am 
planning to use RabbitMQ for my messaging system. But I wonder which will give 
better performance if writing directly into Rabbit with Ack support Vs a 
temporary Queue in Cassandra first and then dequeue and publish in Rabbit.I use 
Rabbit for Messaging because of the Routing and Push model communication etc. 
So I am thinking of using Cassandra as a temporary Queue which will give fast 
write performance with no data loss Vs waiting for Rabbit Ack @ application 
level or handling Rabbit re-connection Vs Cassandra hinted handoff writes.

So Cassandra might aggregate all my msg queue temporarily before I publish them 
to Rabbit. Is this fine? If so, please share your insight on which model  
access pattern will be a better fit for this usage. Throughput requirements may 
be around say 100 ops/sec.

Regards,
Jagan

 On Sat, 22 Feb 2014 21:10:36 +0530 Tupshin Harpertups...@tupshin.com 
wrote  

While, historically, it has been true that queuing in Cassandra has been an 
anti-pattern, it is also true that Leveled Compaction addresses the worst 
aspect of frequent deletes in Cassandra, and that overall, queuing in 
Cassandra is nowhere near the anti-pattern that it used to be. This is 
something that I've been meaning to write about more extensively.   
If your requirements are more around availability   (particularly multi-dc) 
and relability with moderate (not extreme)   performance, it is quite 
possible to build a pretty decent system   on top of Cassandra. You don't 
mention your throughput   requirements, nor additional semantics that might 
be necessary   (e.g. deliver at-least-once vs deliver exactly once), but
   Cassandra 2.0's lightweight transactions provide a CAS primitive   that 
can be used to ensure deliver-once if that is a requirement.

I'd be happy to continue discussing appropriate data-models and   access 
patterns if you decide to go down this path.

-Tupshin


On Sat, Feb 22, 2014 at 10:03 AM, Jagan Ranganathan 
ja...@zohocorp.com wrote:
Hi, 
I need to decouple some of the work being processed   from the 
user thread to provide better user   experience. For that I 
need a queuing system with the   following needs,
High AvailabilityNo Data Loss   
 Better Performance.
Following are some libraries that were considered   along with 
the limitation I see,
Redis - Data LossZooKeeper - Not
 advised for Queue system.  
  TokyoCabinet/SQLite/LevelDB - of this Level 
DB seem to be performing better. With replication 
requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.
After checking on the third option above, I kind of   wonder if 
Cassandra with Leveled Compaction offer a   similar system. Do 
you see any issues in such a usage   or is there other better 
solutions available.

Will be great to get insights on this.

Regards,
Jagan

[OT]: Can I have a non-delivering subscription?

2014-02-22 Thread Paul LeoNerd Evans
A question about the mailing list itself, rather than Cassandra.

I've re-subscribed simply because I have to be subscribed in order to
send to the list, as I sometimes try to when people Cc questions about
my Net::Async::CassandraCQL perl module to me. However, if I want to
read the list, I usually do so on the online archives and not by mail.

Is it possible to have a non-delivering subscription, which would let
me send messages, but doesn't deliver anything back to me?

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Queuing System

2014-02-22 Thread Joe Stein
Without them you have no durability.  

With them you have guarantees... More than any other system with messaging 
features.  It is a durable CP commit log.  Works very well for data pipelines 
with AP systems like Cassandra which is a different system solving different 
problems.  When a Kafka leader fails you right might block and wait for 10ms 
while a new leader is elected but writes can be guaranteed.

The consumers then read and process data and write to Cassandra. And then have 
your app read from Cassandra for what what was processed.

These are very typical type architectures at scale 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Feb 22, 2014, at 11:49 AM, Jagan Ranganathan ja...@zohocorp.com wrote:

 Hi Joe,
 
 If my understanding is right, Kafka does not satisfy the high 
 availability/replication part well because of the need for leader and In-Sync 
 replicas. 
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steincrypt...@gmail.com wrote 
  
 
 If performance and availability for messaging is a requirement then use 
 Apache Kafka http://kafka.apache.org/
 
 You can pass the same thrift/avro objects through the Kafka commit log or 
 strings or whatever you want.
 
 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
 
 
 On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote:
 
 Hi Michael,
 
 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
 will give better performance if writing directly into Rabbit with Ack support 
 Vs a temporary Queue in Cassandra first and then dequeue and publish in 
 Rabbit.
 
 Complexities involving - Handling scenarios like Rabbit Connection failure 
 etc Vs Cassandra write performance and replication with hinted handoff 
 support etc, makes me wonder if this is a better path.
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing 
 michael.la...@nytimes.com wrote  
 
 We use RabbitMQ for queuing and Cassandra for persistence.
 
 RabbitMQ with clustering and/or federation should meet your high availability 
 needs.
 
 Michael
 
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Jagan 
 
  Queue-like data structures are known to be one of the worst anti patterns 
 for Cassandra:  
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:
 Hi,
 
 I need to decouple some of the work being processed from the user thread to 
 provide better user experience. For that I need a 
 queuing system with the following needs,
 High Availability
 No Data Loss
 Better Performance.
 Following are some libraries that were considered along with the limitation I 
 see,
 Redis - Data Loss
 ZooKeeper - Not advised for Queue system.
 TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
 With replication requirement, I probably have to look at Apache 
 ActiveMQ+LevelDB.
 After checking on the third option above, I kind of wonder if Cassandra with 
 Leveled Compaction offer a similar system. Do you see any issues in such a 
 usage or is there other better solutions available.
 
 Will be great to get insights on this.
 
 Regards,
 Jagan
 
 
 
 


Loading CQL PDO for CentOS PHP

2014-02-22 Thread Spencer Brown
I'm trying to get CQL going for my CentOS 5 cassandra PHP platform.
I've installed
thrift, but when I try to make cassandra-pdo or YACassandraPDO for that
matter, none of the tests pass.  And when I install it with PHP, phpinfo
still doesn't show it loading and it doesn't work.

Any ideas would be appreciated.  There are pretty good instructions here -
https://code.google.com/a/apache-extras.org/p/cassandra-pdo/ - for
other platforms.  But I can't find anything devoted to CentOS.

Spencer


Re: Update multiple rows in a CQL lightweight transaction

2014-02-22 Thread Tupshin Harper
#5633 was actually closed  because the static columns feature (
https://issues.apache.org/jira/browse/CASSANDRA-6561) which has been
checked in to the 2.0 branch but is not yet part of a release (it will be
in 2.0.6).

That feature will let you update multiple rows within a single partition by
doing a CAS write based on a static column shared by all rows within the
partition.

Example extracted from the ticket:
CREATE TABLE foo (
x text,
y bigint,
t bigint static,
z bigint,
PRIMARY KEY (x, y) );

insert into foo (x,y,t, z) values ('a', 1, 1, 10);
insert into foo (x,y,t, z) values ('a', 2, 2, 20);

select * from foo;

x | y | t | z
---+---+---+
 a | 1 | 2 | 10
 a | 2 | 2 | 20
(Note that both values of t are 2 because it is static)


 begin batch update foo set z = 1 where x = 'a' and y = 1; update foo set z
= 2 where x = 'a' and y = 2 if t = 4; apply batch;

 [applied] | x | y| t
---+---+--+---
 False | a | null | 2

(Both updates failed to apply because there was an unmet conditional on one
of them)

select * from foo;

 x | y | t | z
---+---+---+
 a | 1 | 2 | 10
 a | 2 | 2 | 20


begin batch update foo set z = 1 where x = 'a' and y = 1; update foo set z
= 2 where x = 'a' and y = 2 if t = 2; apply batch;

 [applied]
---
  True

(both updates succeeded because the check on t succeeded)

select * from foo;
x | y | t | z
---+---+---+---
 a | 1 | 2 | 1
 a | 2 | 2 | 2

Hope this helps.

-Tupshin



On Fri, Feb 21, 2014 at 6:05 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Clint

  The Resolution status of the JIRA is set to Later, probably the
 implementation is not done yet. The JIRA was opened to discuss about impl
 strategy but nothing has been coded so far I guess.



 On Sat, Feb 22, 2014 at 12:02 AM, Clint Kelly clint.ke...@gmail.comwrote:

 Folks,

 Does anyone know how I can modify multiple rows at once in a
 lightweight transaction in CQL3?

 I saw the following ticket:

 https://issues.apache.org/jira/browse/CASSANDRA-5633

 but it was not obvious to me from the comments how (or whether) this
 got resolved.  I also couldn't find anything in the DataStax
 documentation about how to perform these operations.

 I'm in particular interested in how to perform a compare-and-set
 operation that modifies multiple rows (with the same partition key)
 using the DataStax Java driver.

 Thanks!

 Best regards,
 Clint





abusing cassandra's multi DC abilities

2014-02-22 Thread Jonathan Haddad
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.

Full Story:
We're planning on adding data centers throughout the US.  Our platform is
used for business communications.  Each DC currently utilizes elastic
search and redis.  A message can be sent from one user to another, and the
intent is that it would be seen in near-real-time.  This means that 2
people may be using different data centers, and the messages need to
propagate from one to the other.

On the plus side, we know we get this with Cassandra (fist pump) but the
other pieces, not so much.  Even if they did work, there's all sorts of
race conditions that could pop up from having different pieces of our
architecture communicating over different channels.  From this, we've
arrived at the idea that since Cassandra is the authoritative data source,
we might be able to trigger events in DC2 based on activity coming through
either the commit log or some other means.  One idea was to use a CF with a
low gc time as a means of transporting messages between DCs, and watching
the commit logs for deletes to that CF in order to know when we need to do
things like reindex a document (or a new document), bust cache, etc.
 Facebook did something similar with their modifications to MySQL to
include cache keys in the replication log.

Assuming this is sane, I'd want to avoid having the same event register on
3 servers, thus registering 3 items in the queue when only one should be
there.  So, for any piece of data replicated from the other DC, I'd need a
way to determine if it was supposed to actually trigger the event or not.
 (Maybe it looks at the token and determines if the current server falls in
the token range?)  Or is there a better way?

So, my questions to all ye Cassandra users:

1. Is this is even sane?
2. Is anyone doing it?

-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: List support in Net::Async::CassandraCQL ?

2014-02-22 Thread Jacob Rhoden
Hi Paul,

On 23 Feb 2014, at 4:15 am, Paul LeoNerd Evans leon...@leonerd.org.uk wrote:
 On Sat, 22 Feb 2014 14:03:06 +1100 Jacob Rhoden jacob.rho...@me.com wrote:
my $q = $cass-prepare(update contact set name=?, address=?
 where uuid=?)-get; push @f, $q-execute([$name, @address, $uuid]);
Future-needs_all( @f )-get;
 
 Returns the following:
 
Cannot encode address: not an ARRAY
 at /usr/local/share/perl/5.14.2/Net/Async/CassandraCQL/Query.pm line
 182
 
 It needs to arrive as an ARRAYref:
 
  $q-execute([$name, \@address, $uuid]);


Thanks! I did try this without success. Perhaps I am just making a simple perl 
mistake then? I’ve been doing java so long, my perl is a little rusty:

my @address = ();
if(defined $a1  $a1 ne ) {
push @address, $a1;
}
if(defined $a2  $a2 ne ) {
push @address, $a2;
}
if(defined $a3  $a3 ne ) {
push @address, $a3;
}

my @f;
my $q = $cass-prepare(update contact set name=?, address=? where 
uuid=?)-get;
push @f, $q-execute([$name, \@address, $uuid]);
Future-needs_all( @f )-get;

But this also returns an error:

Cannot encode address: not an ARRAY at 
/usr/local/share/perl/5.14.2/Net/Async/CassandraCQL/Query.pm line 182



Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Thanks Duy Hai for sharing the details. I have a doubt. If for some reason 
there is a Network Partition or more than 2 Node failure serving the same 
partition/load and you ended up writing hinted hand-off. 

Is there a possibility of a data loss? If yes, how do we avoid that?


Regards,
Jagan

 On Sat, 22 Feb 2014 22:48:19 +0530 DuyHai Doan 
lt;doanduy...@gmail.comgt; wrote  


Jagan
 

Few time ago I dealed with a similar queuing design for one customer. 
 

 If you never delete messages in the queue, then it is possible to use wide 
rows with bucketing and increasing monotonic column name to store messages.
 

CREATE TABLE read_only_queue (

   bucket_number int,

   insertion_time timeuuid,

   message text,

   PRIMARY KEY(bucket_number,insertion_time)
);
 

  Let's say that you allow only 100 000 messages per partition (physical row) 
to avoid too wide rows, then inserting/reading from the table read_only_queue 
is easy;
 

  For message producer :
 

1) Start at bucket_number = 1

2) Insert messages with column name = generated timeUUID with micro-second 
precision (depending on whether the insertion rate is high or not)

 3) If message count = 100 000, increment bucket_number by one and go to 2)
 

 For message reader:
 
   1) Start at bucket_number = 1
2) Read messages by slice of  N, save the insertion_time of the last read 
message
  3) Use the saved insertion_time to perform next slice query 
4) If read messages count = 100 000, increment bucket_number and go to 2). 
Keep the insertion_time, do not reset it since his value is increasing 
monotonically
 

 For multiple and concurrent producers amp; writers, there is a trick. Let's 
assume you have P concurrent producers and C concurrent consumers.
 

   Assign a numerical ID for each producer and consumer. First producer ID = 
1... last producer ID = P. Same for consumers.

   

   - re-use the above algorithm

   - each producer/consumer start at bucket_number = his ID 

   - at the end of the row,
- next bucket_number = current bucker_number + P for producers
 - next bucket_number = current bucker_number + C for consumers
  
 

 The last thing to take care of is compaction configuration to reduce the 
number of SSTables on disk.
 

 If you achieve to get rid of accumulation effects, e.g reading rate is faster 
than writing rate,  the message are likely to be consumed while it's still in 
memory (in memtable) at server side. In this particular case, you can optimize 
further by deactivating compaction for the table. 
 

 Regards
 

  Duy Hai 

  

  
 

  

 
  




  
 
 On Sat, Feb 22, 2014 at 5:56 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

 Thanks for the pointer. 
  

 Following are some options given there,
If you know where your live data begins, hint Cassandra with a start 
column, to reduce the scan times and the amount of tombstones to collect.
   A broker will usually have some notion of what’s next in the sequence and 
thus be able to do much more targeted queries, down to a single record if the 
storage strategy were to choose monotonic sequence numbers.

  We need to do is have some intelligence in using the system and avoid 
tombstones either use the pointed Column Name or use proper start column if 
slice query is used.
   

  Is that right or I am missing something here?
   

  Regards,
  Jagan
   
 On Sat, 22 Feb 2014 20:55:39 +0530 DuyHai Doanlt;doanduy...@gmail.comgt; 
wrote  

   
Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  


  
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 


 






 



 




Re: [OT]: Can I have a non-delivering subscription?

2014-02-22 Thread Robert Wille
Yeah, it¹s called a rule. Set one up to delete everything from
user@cassandra.apache.org.

On 2/22/14, 10:32 AM, Paul LeoNerd Evans leon...@leonerd.org.uk
wrote:

A question about the mailing list itself, rather than Cassandra.

I've re-subscribed simply because I have to be subscribed in order to
send to the list, as I sometimes try to when people Cc questions about
my Net::Async::CassandraCQL perl module to me. However, if I want to
read the list, I usually do so on the online archives and not by mail.

Is it possible to have a non-delivering subscription, which would let
me send messages, but doesn't deliver anything back to me?

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/




Re: Disabling opscenter data collection in Datastax community 2.0

2014-02-22 Thread user 01
I would be using nodetool  JConsole for monitoring. Though it would
be less informative but I think it will do. Otherwise also I cannot
use Opscenter as I am not using the DSE but DSC, in production. So I
am not allowed to use it for prod. use, Isn't it ? Not everyone here
as well is using DSE hence Opscenter is not used in every Cassandra
production installation.

I installed Dsc20 using apt-get after adding datastax repository as
suggested in the datastax's Cassandra 2.0 docs. I found that Opscenter
keyspace was created by default when I installed dsc20,  it would
make no sense that it writes data which is unused in case I don't use
Opscenter.

On 2/22/14, Michael Shuler mich...@pbandjelly.org wrote:
 On 02/22/2014 06:12 AM, user 01 wrote:
 I'm using dsc20 (datastax community edition for cassandra 2.0) in
 production environment. But since I am not authorized to use Opscenter
 for production use. So how do I disable the data recording that is being
 done for opscenter consumption, as this is just a unusable for me  will
 put unnecessary load on my machine ?

 The agent is lightweight.  How are you planning to monitor your
 production env?

 You didn't hint at how you installed DSC/OpsCenter, but stop the agents,
 uninstall the agents, and drop the keyspace.  The details of those steps
 depend on how you installed (rpm, deb, tar).

 http://www.datastax.com/documentation/opscenter/4.0/opsc/reference/opscInstallLocations_g.html
 http://www.datastax.com/documentation/opscenter/4.0/opsc/online_help/opscRemovingPackages_t.html

 Those docs might be helpful.  Let us know how you installed DSC, if you
 need some better details.

 --
 Kind regards,
 Michael



Re: Disabling opscenter data collection in Datastax community 2.0

2014-02-22 Thread Tupshin Harper
You can use OpsCenter in production with DSC/Apache Cassandra clusters.
Some features are only enabled with DSE, but the rest work fine with DSC.

-Tupshin
On Feb 22, 2014 11:20 PM, user 01 user...@gmail.com wrote:

 I would be using nodetool  JConsole for monitoring. Though it would
 be less informative but I think it will do. Otherwise also I cannot
 use Opscenter as I am not using the DSE but DSC, in production. So I
 am not allowed to use it for prod. use, Isn't it ? Not everyone here
 as well is using DSE hence Opscenter is not used in every Cassandra
 production installation.

 I installed Dsc20 using apt-get after adding datastax repository as
 suggested in the datastax's Cassandra 2.0 docs. I found that Opscenter
 keyspace was created by default when I installed dsc20,  it would
 make no sense that it writes data which is unused in case I don't use
 Opscenter.

 On 2/22/14, Michael Shuler mich...@pbandjelly.org wrote:
  On 02/22/2014 06:12 AM, user 01 wrote:
  I'm using dsc20 (datastax community edition for cassandra 2.0) in
  production environment. But since I am not authorized to use Opscenter
  for production use. So how do I disable the data recording that is being
  done for opscenter consumption, as this is just a unusable for me  will
  put unnecessary load on my machine ?
 
  The agent is lightweight.  How are you planning to monitor your
  production env?
 
  You didn't hint at how you installed DSC/OpsCenter, but stop the agents,
  uninstall the agents, and drop the keyspace.  The details of those steps
  depend on how you installed (rpm, deb, tar).
 
 
 http://www.datastax.com/documentation/opscenter/4.0/opsc/reference/opscInstallLocations_g.html
 
 http://www.datastax.com/documentation/opscenter/4.0/opsc/online_help/opscRemovingPackages_t.html
 
  Those docs might be helpful.  Let us know how you installed DSC, if you
  need some better details.
 
  --
  Kind regards,
  Michael