Re: Queuing System

2014-02-23 Thread Jagan Ranganathan
Thanks Joe. That's a nice pointer. Will explore the possibility. I am just 
concerned about the Leader swap time window, but may be thats the tradeoff b/n 
data consistency Vs availability.

Regards,
Jagan

 On Sat, 22 Feb 2014 23:08:00 +0530 Joe Stein lt;crypt...@gmail.comgt; 
wrote  


 Without them you have no durability.  
  

 With them you have guarantees... More than any other system with messaging 
features.  It is a durable CP commit log.  Works very well for data pipelines 
with AP systems like Cassandra which is a different system solving different 
problems.  When a Kafka leader fails you right might block and wait for 10ms 
while a new leader is elected but writes can be guaranteed.
  

 The consumers then read and process data and write to Cassandra. And then have 
your app read from Cassandra for what what was processed.
  

 These are very typical type architectures at scale 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
   

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
  


  
On Feb 22, 2014, at 11:49 AM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
 

Hi Joe, 

 If my understanding is right, Kafka does not satisfy the high 
availability/replication part well because of the need for leader and In-Sync 
replicas. 
  

 Regards,
 Jagan
  
 On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steinlt;crypt...@gmail.comgt; 
wrote  

 
   If performance and availability for messaging is a requirement then use 
Apache Kafka http://kafka.apache.org/
 
You can pass the same thrift/avro objects through the Kafka commit log or 
strings or whatever you want.
   

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
  


  
On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
 

Hi Michael, 

 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
will give better performance if writing directly into Rabbit with Ack support 
Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit.
  

 Complexities involving - Handling scenarios like Rabbit Connection failure etc 
Vs Cassandra write performance and replication with hinted handoff support etc, 
makes me wonder if this is a better path.
  

 Regards,
 Jagan
  
 On Sat, 22 Feb 2014 21:01:14 +0530  Michael Laing 
lt;michael.la...@nytimes.comgt; wrote  

 
   We use RabbitMQ for queuing and Cassandra for persistence. 

 RabbitMQ with clustering and/or federation should meet your high availability 
needs.
  

 Michael

  
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan lt;doanduy...@gmail.comgt; 
wrote:
   Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  



 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 




 


 





 




 





Re: Queuing System

2014-02-23 Thread Edward Capriolo
...@zohocorp.comwrote:

  Hi,

  I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan








Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi,

I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
No Data Loss
Better Performance.

Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
ZooKeeper - Not advised for Queue system.
TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.


Will be great to get insights on this.


Regards,
Jagan



Re: Queuing System

2014-02-22 Thread DuyHai Doan
Jagan

 Queue-like data structures are known to be one of the worst anti patterns
for Cassandra:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets



On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan



Re: Queuing System

2014-02-22 Thread Laing, Michael
We use RabbitMQ for queuing and Cassandra for persistence.

RabbitMQ with clustering and/or federation should meet your high
availability needs.

Michael


On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Jagan

  Queue-like data structures are known to be one of the worst anti patterns
 for Cassandra:
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets



 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be
performing better. With replication requirement, I probably have to look 
 at
Apache ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan





Re: Queuing System

2014-02-22 Thread Tupshin Harper
While, historically, it has been true that queuing in Cassandra has been an
anti-pattern, it is also true that Leveled Compaction addresses the worst
aspect of frequent deletes in Cassandra, and that overall, queuing in
Cassandra is nowhere near the anti-pattern that it used to be. This is
something that I've been meaning to write about more extensively.

 If your requirements are more around availability (particularly multi-dc)
and relability with moderate (not extreme) performance, it is quite
possible to build a pretty decent system on top of Cassandra. You don't
mention your throughput requirements, nor additional semantics that might
be necessary (e.g. deliver at-least-once vs deliver exactly once), but
Cassandra 2.0's lightweight transactions provide a CAS primitive that can
be used to ensure deliver-once if that is a requirement.

I'd be happy to continue discussing appropriate data-models and access
patterns if you decide to go down this path.

-Tupshin


On Sat, Feb 22, 2014 at 10:03 AM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan



Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi Michael,

Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
will give better performance if writing directly into Rabbit with Ack support 
Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit.


Complexities involving - Handling scenarios like Rabbit Connection failure etc 
Vs Cassandra write performance and replication with hinted handoff support etc, 
makes me wonder if this is a better path.


Regards,
Jagan

 On Sat, 22 Feb 2014 21:01:14 +0530  Michael Laing 
lt;michael.la...@nytimes.comgt; wrote  


 We use RabbitMQ for queuing and Cassandra for persistence. 

 RabbitMQ with clustering and/or federation should meet your high availability 
needs.
  

 Michael

  
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan lt;doanduy...@gmail.comgt; 
wrote:
   Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  



 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 




 


 




Re: Queuing System

2014-02-22 Thread Joe Stein
If performance and availability for messaging is a requirement then use Apache 
Kafka http://kafka.apache.org/

You can pass the same thrift/avro objects through the Kafka commit log or 
strings or whatever you want.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote:

 Hi Michael,
 
 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
 will give better performance if writing directly into Rabbit with Ack support 
 Vs a temporary Queue in Cassandra first and then dequeue and publish in 
 Rabbit.
 
 Complexities involving - Handling scenarios like Rabbit Connection failure 
 etc Vs Cassandra write performance and replication with hinted handoff 
 support etc, makes me wonder if this is a better path.
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing 
 michael.la...@nytimes.com wrote  
 
 We use RabbitMQ for queuing and Cassandra for persistence.
 
 RabbitMQ with clustering and/or federation should meet your high availability 
 needs.
 
 Michael
 
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Jagan 
 
  Queue-like data structures are known to be one of the worst anti 
   patterns for Cassandra:  
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:
 Hi,
 
 I need to decouple some of the work being processed from the user thread to 
 provide better user experience. For that I need a queuing system with the 
 following needs,
 High Availability
 No Data Loss
 Better Performance.
 Following are some libraries that were considered along with the limitation I 
 see,
 Redis - Data Loss
 ZooKeeper - Not advised for Queue system.
 TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
 With replication requirement, I probably have to look at Apache 
 ActiveMQ+LevelDB.
 After checking on the third option above, I kind of wonder if Cassandra with 
 Leveled Compaction offer a similar system. Do you see any issues in such a 
 usage or is there other   better solutions available.
 
 Will be great to get insights on this.
 
 Regards,
 Jagan
 
 
 


Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi Joe,

If my understanding is right, Kafka does not satisfy the high 
availability/replication part well because of the need for leader and In-Sync 
replicas. 


Regards,
Jagan

 On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steinlt;crypt...@gmail.comgt; 
wrote  


 If performance and availability for messaging is a requirement then use Apache 
Kafka http://kafka.apache.org/
 
You can pass the same thrift/avro objects through the Kafka commit log or 
strings or whatever you want.
   

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
  


  
On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
 

 
   Hi Michael, 

 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
will give better performance if writing directly into Rabbit with Ack support 
Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit.
  

 Complexities involving - Handling scenarios like Rabbit Connection failure etc 
Vs Cassandra write performance and replication with hinted handoff support etc, 
makes me wonder if this is a better path.
  

 Regards,
 Jagan
  
 On Sat, 22 Feb 2014 21:01:14 +0530  Michael Laing 
lt;michael.la...@nytimes.comgt; wrote  

 
   We use RabbitMQ for queuing and Cassandra for persistence. 

 RabbitMQ with clustering and/or federation should meet your high availability 
needs.
  

 Michael

  
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan lt;doanduy...@gmail.comgt; 
wrote:
   Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  



 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 




 


 




 




Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Hi,

Thanks for the pointer. 


Following are some options given there,
If you know where your live data begins, hint Cassandra with a start column, to 
reduce the scan times and the amount of tombstones to collect.
 A broker will usually have some notion of what’s next in the sequence and thus 
be able to do much more targeted queries, down to a single record if the 
storage strategy were to choose monotonic sequence numbers.

We need to do is have some intelligence in using the system and avoid 
tombstones either use the pointed Column Name or use proper start column if 
slice query is used.


Is that right or I am missing something here?


Regards,
Jagan

 On Sat, 22 Feb 2014 20:55:39 +0530 DuyHai Doanlt;doanduy...@gmail.comgt; 
wrote  


  Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  


  
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 


 




Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Thanks Tupshin for your assistance. As I mentioned in the other mail, Yes I am 
planning to use RabbitMQ for my messaging system. But I wonder which will give 
better performance if writing directly into Rabbit with Ack support Vs a 
temporary Queue in Cassandra first and then dequeue and publish in Rabbit.I use 
Rabbit for Messaging because of the Routing and Push model communication etc. 
So I am thinking of using Cassandra as a temporary Queue which will give fast 
write performance with no data loss Vs waiting for Rabbit Ack @ application 
level or handling Rabbit re-connection Vs Cassandra hinted handoff writes.

So Cassandra might aggregate all my msg queue temporarily before I publish them 
to Rabbit. Is this fine? If so, please share your insight on which model amp; 
access pattern will be a better fit for this usage. Throughput requirements may 
be around say 100 ops/sec.


Regards,
Jagan


 On Sat, 22 Feb 2014 21:10:36 +0530 Tupshin 
Harperlt;tups...@tupshin.comgt; wrote  


 While, historically, it has been true that queuing in Cassandra has been an 
anti-pattern, it is also true that Leveled Compaction addresses the worst 
aspect of frequent deletes in Cassandra, and that overall, queuing in Cassandra 
is nowhere near the anti-pattern that it used to be. This is something that 
I've been meaning to write about more extensively. 

  If your requirements are more around availability (particularly multi-dc) and 
relability with moderate (not extreme) performance, it is quite possible to 
build a pretty decent system on top of Cassandra. You don't mention your 
throughput requirements, nor additional semantics that might be necessary (e.g. 
deliver at-least-once vs deliver exactly once), but Cassandra 2.0's lightweight 
transactions provide a CAS primitive that can be used to ensure deliver-once if 
that is a requirement.
  

 I'd be happy to continue discussing appropriate data-models and access 
patterns if you decide to go down this path.
  

 -Tupshin
  
 
  On Sat, Feb 22, 2014 at 10:03 AM, Jagan Ranganathan 
lt;ja...@zohocorp.comgt; wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan



 



 





Re: Queuing System

2014-02-22 Thread DuyHai Doan
Jagan

Few time ago I dealed with a similar queuing design for one customer.

*If you never delete messages in the queue*, then it is possible to use
wide rows with bucketing and increasing monotonic column name to store
messages.

CREATE TABLE *read_only_queue *(
   bucket_number int,
   insertion_time timeuuid,
   message text,
   PRIMARY KEY(bucket_number,insertion_time)
);

 Let's say that you allow only 100 000 messages per partition (physical
row) to avoid too wide rows, then inserting/reading from the table
*read_only_queue
*is easy;

 For message producer :

   1) Start at bucket_number = 1
   2) Insert messages with column name = generated timeUUID with
micro-second precision (depending on whether the insertion rate is high or
not)
   3) If message count = 100 000, increment bucket_number by one and go to
2)

For message reader:

   1) Start at bucket_number = 1
   2) Read messages by slice of *N, *save the *insertion_time *of the last
read message
   3) Use the saved *insertion_time *to perform next slice query
   4) If read messages count = 100 000, increment bucket_number and go to
2). Keep the *insertion_time, *do not reset it since his value is
increasing monotonically

For multiple and concurrent producers  writers, there is a trick. Let's
assume you have *P* concurrent producers and *C* concurrent consumers.

  Assign a numerical ID for each producer and consumer. First producer ID =
1... last producer ID = *P*. Same for consumers.

  - re-use the above algorithm
  - each producer/consumer start at *bucket_number *= his ID
  - at the end of the row,
- next bucket_number = current bucker_number + *P* for producers
- next bucket_number = current bucker_number + *C* for consumers


The last thing to take care of is compaction configuration to reduce the
number of SSTables on disk.

If you achieve to get rid of accumulation effects, e.g reading rate is
faster than writing rate,  the message are likely to be consumed while it's
still in memory (in memtable) at server side. In this particular case, you
can optimize further by deactivating compaction for the table.

Regards

 Duy Hai








On Sat, Feb 22, 2014 at 5:56 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

 Hi,

 Thanks for the pointer.

 Following are some options given there,

- If you know where your live data begins, hint Cassandra with a start
column, to reduce the scan times and the amount of tombstones to collect.
-  A broker will usually have some notion of what's next in the
sequence and thus be able to do much more targeted queries, down to a
single record if the storage strategy were to choose monotonic sequence
numbers.

 We need to do is have some intelligence in using the system and avoid
 tombstones either use the pointed Column Name or use proper start column if
 slice query is used.

 Is that right or I am missing something here?

 Regards,
 Jagan

  On Sat, 22 Feb 2014 20:55:39 +0530 *DuyHai Doandoanduy...@gmail.com
 doanduy...@gmail.com* wrote 

  Jagan

   Queue-like data structures are known to be one of the worst anti
 patterns for Cassandra:
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets



 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.comwrote:

  Hi,

  I need to decouple some of the work being processed from the user thread
 to provide better user experience. For that I need a queuing system with
 the following needs,

- High Availability
- No Data Loss
- Better Performance.

 Following are some libraries that were considered along with the
 limitation I see,

- Redis - Data Loss
- ZooKeeper - Not advised for Queue system.
- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing
better. With replication requirement, I probably have to look at Apache
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra
 with Leveled Compaction offer a similar system. Do you see any issues in
 such a usage or is there other better solutions available.

 Will be great to get insights on this.

 Regards,
 Jagan






Re: Queuing System

2014-02-22 Thread thunder stumpges
We use this same setup also and it works great. 
Thunder

- Reply message -
From: Laing, Michael michael.la...@nytimes.com
To: user@cassandra.apache.org
Subject: Queuing System
Date: Sat, Feb 22, 2014 7:31 AM

We use RabbitMQ for queuing and Cassandra for persistence.
RabbitMQ with clustering and/or federation should meet your high availability 
needs.

Michael



On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:

Jagan




 Queue-like data structures are known to be one of the worst anti patterns for 
Cassandra: 
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets






On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:


Hi,
I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,


High AvailabilityNo Data LossBetter Performance.
Following are some libraries that were considered along with the limitation I 
see,


Redis - Data LossZooKeeper - Not advised for Queue 
system.TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing 
better. With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.


After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.



Will be great to get insights on this.

Regards,
Jagan

Re: Queuing System

2014-02-22 Thread thunder stumpges
This seems a bit overkill. We run far more than 100mps (closer to 600) in 
rabbit with very good latency on a 3 node cluster. It has been very reliable as 
well. 
Thunder

- Reply message -
From: Jagan Ranganathan ja...@zohocorp.com
To: user@cassandra.apache.org
Subject: Queuing System
Date: Sat, Feb 22, 2014 9:06 AM

Thanks Tupshin for your assistance. As I mentioned in the other mail, Yes I am 
planning to use RabbitMQ for my messaging system. But I wonder which will give 
better performance if writing directly into Rabbit with Ack support Vs a 
temporary Queue in Cassandra first and then dequeue and publish in Rabbit.I use 
Rabbit for Messaging because of the Routing and Push model communication etc. 
So I am thinking of using Cassandra as a temporary Queue which will give fast 
write performance with no data loss Vs waiting for Rabbit Ack @ application 
level or handling Rabbit re-connection Vs Cassandra hinted handoff writes.

So Cassandra might aggregate all my msg queue temporarily before I publish them 
to Rabbit. Is this fine? If so, please share your insight on which model  
access pattern will be a better fit for this usage. Throughput requirements may 
be around say 100 ops/sec.

Regards,
Jagan

 On Sat, 22 Feb 2014 21:10:36 +0530 Tupshin Harpertups...@tupshin.com 
wrote  

While, historically, it has been true that queuing in Cassandra has been an 
anti-pattern, it is also true that Leveled Compaction addresses the worst 
aspect of frequent deletes in Cassandra, and that overall, queuing in 
Cassandra is nowhere near the anti-pattern that it used to be. This is 
something that I've been meaning to write about more extensively.   
If your requirements are more around availability   (particularly multi-dc) 
and relability with moderate (not extreme)   performance, it is quite 
possible to build a pretty decent system   on top of Cassandra. You don't 
mention your throughput   requirements, nor additional semantics that might 
be necessary   (e.g. deliver at-least-once vs deliver exactly once), but
   Cassandra 2.0's lightweight transactions provide a CAS primitive   that 
can be used to ensure deliver-once if that is a requirement.

I'd be happy to continue discussing appropriate data-models and   access 
patterns if you decide to go down this path.

-Tupshin


On Sat, Feb 22, 2014 at 10:03 AM, Jagan Ranganathan 
ja...@zohocorp.com wrote:
Hi, 
I need to decouple some of the work being processed   from the 
user thread to provide better user   experience. For that I 
need a queuing system with the   following needs,
High AvailabilityNo Data Loss   
 Better Performance.
Following are some libraries that were considered   along with 
the limitation I see,
Redis - Data LossZooKeeper - Not
 advised for Queue system.  
  TokyoCabinet/SQLite/LevelDB - of this Level 
DB seem to be performing better. With replication 
requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.
After checking on the third option above, I kind of   wonder if 
Cassandra with Leveled Compaction offer a   similar system. Do 
you see any issues in such a usage   or is there other better 
solutions available.

Will be great to get insights on this.

Regards,
Jagan

Re: Queuing System

2014-02-22 Thread Joe Stein
Without them you have no durability.  

With them you have guarantees... More than any other system with messaging 
features.  It is a durable CP commit log.  Works very well for data pipelines 
with AP systems like Cassandra which is a different system solving different 
problems.  When a Kafka leader fails you right might block and wait for 10ms 
while a new leader is elected but writes can be guaranteed.

The consumers then read and process data and write to Cassandra. And then have 
your app read from Cassandra for what what was processed.

These are very typical type architectures at scale 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Feb 22, 2014, at 11:49 AM, Jagan Ranganathan ja...@zohocorp.com wrote:

 Hi Joe,
 
 If my understanding is right, Kafka does not satisfy the high 
 availability/replication part well because of the need for leader and In-Sync 
 replicas. 
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steincrypt...@gmail.com wrote 
  
 
 If performance and availability for messaging is a requirement then use 
 Apache Kafka http://kafka.apache.org/
 
 You can pass the same thrift/avro objects through the Kafka commit log or 
 strings or whatever you want.
 
 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
 
 
 On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote:
 
 Hi Michael,
 
 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
 will give better performance if writing directly into Rabbit with Ack support 
 Vs a temporary Queue in Cassandra first and then dequeue and publish in 
 Rabbit.
 
 Complexities involving - Handling scenarios like Rabbit Connection failure 
 etc Vs Cassandra write performance and replication with hinted handoff 
 support etc, makes me wonder if this is a better path.
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing 
 michael.la...@nytimes.com wrote  
 
 We use RabbitMQ for queuing and Cassandra for persistence.
 
 RabbitMQ with clustering and/or federation should meet your high availability 
 needs.
 
 Michael
 
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Jagan 
 
  Queue-like data structures are known to be one of the worst anti patterns 
 for Cassandra:  
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:
 Hi,
 
 I need to decouple some of the work being processed from the user thread to 
 provide better user experience. For that I need a 
 queuing system with the following needs,
 High Availability
 No Data Loss
 Better Performance.
 Following are some libraries that were considered along with the limitation I 
 see,
 Redis - Data Loss
 ZooKeeper - Not advised for Queue system.
 TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
 With replication requirement, I probably have to look at Apache 
 ActiveMQ+LevelDB.
 After checking on the third option above, I kind of wonder if Cassandra with 
 Leveled Compaction offer a similar system. Do you see any issues in such a 
 usage or is there other better solutions available.
 
 Will be great to get insights on this.
 
 Regards,
 Jagan
 
 
 
 


Re: Queuing System

2014-02-22 Thread Jagan Ranganathan
Thanks Duy Hai for sharing the details. I have a doubt. If for some reason 
there is a Network Partition or more than 2 Node failure serving the same 
partition/load and you ended up writing hinted hand-off. 

Is there a possibility of a data loss? If yes, how do we avoid that?


Regards,
Jagan

 On Sat, 22 Feb 2014 22:48:19 +0530 DuyHai Doan 
lt;doanduy...@gmail.comgt; wrote  


Jagan
 

Few time ago I dealed with a similar queuing design for one customer. 
 

 If you never delete messages in the queue, then it is possible to use wide 
rows with bucketing and increasing monotonic column name to store messages.
 

CREATE TABLE read_only_queue (

   bucket_number int,

   insertion_time timeuuid,

   message text,

   PRIMARY KEY(bucket_number,insertion_time)
);
 

  Let's say that you allow only 100 000 messages per partition (physical row) 
to avoid too wide rows, then inserting/reading from the table read_only_queue 
is easy;
 

  For message producer :
 

1) Start at bucket_number = 1

2) Insert messages with column name = generated timeUUID with micro-second 
precision (depending on whether the insertion rate is high or not)

 3) If message count = 100 000, increment bucket_number by one and go to 2)
 

 For message reader:
 
   1) Start at bucket_number = 1
2) Read messages by slice of  N, save the insertion_time of the last read 
message
  3) Use the saved insertion_time to perform next slice query 
4) If read messages count = 100 000, increment bucket_number and go to 2). 
Keep the insertion_time, do not reset it since his value is increasing 
monotonically
 

 For multiple and concurrent producers amp; writers, there is a trick. Let's 
assume you have P concurrent producers and C concurrent consumers.
 

   Assign a numerical ID for each producer and consumer. First producer ID = 
1... last producer ID = P. Same for consumers.

   

   - re-use the above algorithm

   - each producer/consumer start at bucket_number = his ID 

   - at the end of the row,
- next bucket_number = current bucker_number + P for producers
 - next bucket_number = current bucker_number + C for consumers
  
 

 The last thing to take care of is compaction configuration to reduce the 
number of SSTables on disk.
 

 If you achieve to get rid of accumulation effects, e.g reading rate is faster 
than writing rate,  the message are likely to be consumed while it's still in 
memory (in memtable) at server side. In this particular case, you can optimize 
further by deactivating compaction for the table. 
 

 Regards
 

  Duy Hai 

  

  
 

  

 
  




  
 
 On Sat, Feb 22, 2014 at 5:56 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

 Thanks for the pointer. 
  

 Following are some options given there,
If you know where your live data begins, hint Cassandra with a start 
column, to reduce the scan times and the amount of tombstones to collect.
   A broker will usually have some notion of what’s next in the sequence and 
thus be able to do much more targeted queries, down to a single record if the 
storage strategy were to choose monotonic sequence numbers.

  We need to do is have some intelligence in using the system and avoid 
tombstones either use the pointed Column Name or use proper start column if 
slice query is used.
   

  Is that right or I am missing something here?
   

  Regards,
  Jagan
   
 On Sat, 22 Feb 2014 20:55:39 +0530 DuyHai Doanlt;doanduy...@gmail.comgt; 
wrote  

   
Jagan 
   

   Queue-like data structures are known to be one of the worst anti patterns 
for Cassandra:  
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  


  
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan lt;ja...@zohocorp.comgt; 
wrote:
   Hi, 

  I need to decouple some of the work being processed from the user thread to 
provide better user experience. For that I need a queuing system with the 
following needs,
High Availability
  No Data Loss
  Better Performance.

 Following are some libraries that were considered along with the limitation I 
see,
Redis - Data Loss
  ZooKeeper - Not advised for Queue system.
  TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
With replication requirement, I probably have to look at Apache 
ActiveMQ+LevelDB.

 After checking on the third option above, I kind of wonder if Cassandra with 
Leveled Compaction offer a similar system. Do you see any issues in such a 
usage or is there other better solutions available.
  

 Will be great to get insights on this.
 

 Regards,
 Jagan