Writing to multiple tables

2015-03-16 Thread Viswanathan Ramachandran
Hi,

Are Cassandra Batch statements
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/batch_r.html
 the
recommended way for updating same information in multiple tables?

For example if I have the following tables:

person_by_dob
person_by_ssn
person_by_lastname


Then addition/modification of person will result in three writes.

Is BATCH the recommended way of updating all three tables at one go so that
the information between the three tables is consistent ?

In other words, is it an established cassandra usage pattern to use this
BATCH feature for this purpose?

Are there alternate approaches and recommendations?

Thanks
Vish


Consistency Level for Atomic Batches

2014-09-16 Thread Viswanathan Ramachandran
Is consistency level honored for batch statements?

If I have 100 insert/update statements in my batch and use LOCAL_QUORUM
consistency, will the control from coordinator return only after a local
quorum update has been done for all the 100 statements?

Or is it different ?

Thanks
Vish


Re: Consistency Level for Atomic Batches

2014-09-16 Thread Viswanathan Ramachandran
A follow up on the earlier question.

I meant to ask earlier if control returns to client after batch log is
written on coordinator irrespective of consistency level mentioned.

Also: will the coordinator attempt all statements one after the other, or
in parallel ?

Thanks


On Tue, Sep 16, 2014 at 8:00 AM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Is consistency level honored for batch statements?

 If I have 100 insert/update statements in my batch and use LOCAL_QUORUM
 consistency, will the control from coordinator return only after a local
 quorum update has been done for all the 100 statements?

 Or is it different ?

 Thanks
 Vish



Re: Scala driver

2014-09-02 Thread Viswanathan Ramachandran
I haven't used this scala driver, but may possibly explore soon.

https://github.com/websudosuk/phantom
https://websudosuk.github.io/phantom/

Mentioned at at:
http://planetcassandra.org/client-drivers-tools/#Scala
http://www.datastax.com/download#dl-community-drivers




On Mon, Sep 1, 2014 at 10:14 PM, Gary Zhao garyz...@gmail.com wrote:

 Thanks Jan. I decided to use Java driver directly. It's not hard to use.


 On Sun, Aug 31, 2014 at 1:08 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:

 Hi Gary,

 On 31 Aug 2014, at 07:19, Gary Zhao garyz...@gmail.com wrote:

 Hi

 Could you recommend a Scala driver and share your experiences of using
 it. Im thinking if i use java driver in Scala directly


 I am using Martin’s approach without any problems:

 https://github.com/magro/play2-scala-cassandra-sample

 The actual mapping from Java to Scala futures for the async case is in


 https://github.com/magro/play2-scala-cassandra-sample/blob/master/app/models/Utils.scala

 HTH,

 Jan



 Thanks






Re: LOCAL_QUORUM without a replica in current data center

2014-08-19 Thread Viswanathan Ramachandran
Sorry for the spam - but I wanted to double check if anyone had experience
with such a scenario.

Thanks.



On Sun, Aug 17, 2014 at 7:11 PM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Hi,

 How does LOCAL_QUORUM read/write behave when the data center on which
 query is executed does not have a replica of the keyspace?

 Does it result in an error or can it be configured to do LOCAL_QUORUM on
 the nearest data center (as depicted by the dynamic snitch) which has the
 replicas ?

 We are essentially trying to design a Cassandra cluster with a keyspace
 only in certain regional-hub data centers to keep number of replicas
 under control.
 I am curious to know if a cassandra node not in the regional-hub data
 center can handle LOCAL_QUORUM type operations, or if clients really need
 to have a connection to the hub data center with the replica to use that
 consistency level.

 Thanks
 Vish








LOCAL_QUORUM without a replica in current data center

2014-08-17 Thread Viswanathan Ramachandran
Hi,

How does LOCAL_QUORUM read/write behave when the data center on which query
is executed does not have a replica of the keyspace?

Does it result in an error or can it be configured to do LOCAL_QUORUM on
the nearest data center (as depicted by the dynamic snitch) which has the
replicas ?

We are essentially trying to design a Cassandra cluster with a keyspace
only in certain regional-hub data centers to keep number of replicas
under control.
I am curious to know if a cassandra node not in the regional-hub data
center can handle LOCAL_QUORUM type operations, or if clients really need
to have a connection to the hub data center with the replica to use that
consistency level.

Thanks
Vish


Nodetool Repair questions

2014-08-12 Thread Viswanathan Ramachandran
Some questions on nodetool repair.

1. This tool repairs inconsistencies across replicas of the row. Since
latest update always wins, I dont see inconsistencies other than ones
resulting from the combination of deletes, tombstones, and crashed nodes.
Technically, if data is never deleted from cassandra, then nodetool repair
does not need to be run at all. Is this understanding correct? If wrong,
can anyone provide other ways inconsistencies could occur?

2. Want to understand the performance of 'nodetool repair' in a Cassandra
multi data center setup. As we add nodes to the cluster in various data
centers, does the performance of nodetool repair on each node increase
linearly, or is it quadratic ? The essence of this question is: If I have a
keyspace with x number of replicas in each data center, do I have to deal
with an upper limit on the number of data centers/nodes?


Thanks

Vish


Re: Nodetool Repair questions

2014-08-12 Thread Viswanathan Ramachandran
Thanks Mark,
Since we have replicas in each data center, addition of a new data center
(and new replicas) has a performance implication on nodetool repair.
I do understand that adding nodes without increasing number of replicas may
improve repair performance, but in this case we are adding new data center
and additional replicas which is an added overhead on nodetool repair.
Hence the thinking that we may reach an upper limit which could be the
point when the nodetool repair costs are way too high.


On Tue, Aug 12, 2014 at 2:59 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Vish,

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?


 Even if you never delete data you should run repairs occasionally to
 ensure overall consistency. While hinted handoffs and read repairs do lead
 to better consistency, they are only helpers/optimization and are not
 regarded as operations that ensure consistency.

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ?


 Its difficult to calculate the performance of a repair, I've seen the time
 to completion fluctuate between 4hrs to 10hrs+ on the same node. However in
 theory adding more nodes would spread the data and free up machine
 resources, thus resulting in more performant repairs.

 The essence of this question is: If I have a keyspace with x number of
 replicas in each data center, do I have to deal with an upper limit on the
 number of data centers/nodes?


 Could you expand on why you believe there would be an upper limit of
 dc/nodes due to running repairs?


 Mark


 On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran 
 vish.ramachand...@gmail.com wrote:

  Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish





Re: Nodetool Repair questions

2014-08-12 Thread Viswanathan Ramachandran
Andrey, QUORUM consistency and no deletes makes perfect sense.
I believe we could modify that to EACH_QUORUM or QUORUM consistency and no
deletes - isnt that right ?

Thanks


On Tue, Aug 12, 2014 at 3:10 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 1. You don't have to repair if you use QUORUM consistency and you don't
 delete data.
 2.Performance depends on size of data each node has. It's very difficult
 to predict. It may take days.

 Thank you,
   Andrey



 On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran 
 vish.ramachand...@gmail.com wrote:

 Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish





Is nodetool cleanup necessary on nodes of different data center when new node is added

2014-08-07 Thread Viswanathan Ramachandran
I plan to have a multi data center Cassandra 2 setup with 2-4 nodes per
data center and several 10s of data centers. We have keyspaces replicated
on a certain number of nodes on *each* data center. Essentially, each data
center has a logical ring that covers all token ranges. We have a vnode
based deployment. So tokens should get assigned to the nodes automatically.

Documentation at
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
suggests
that addition of new node requires cleanup to be run on all other nodes of
the cluster. However, it does not clarify the procedure in a multi-data
center setup.

My understanding is that nodetool cleanup removes data which no longer
belongs to that node. When a new data center is being setup, we are
creating completely new replicas and AFAICT, it does not result in data
movement/rebalance outside of this new data center and hence there is no
cleanup requirement on nodes of other data centers. Is someone able to
confirm if my understanding is right, and cleanup is not required on nodes
of other data centers?


Thanks

Vish