Re: why set replica placement strategy at keyspace level ?

2013-02-01 Thread aaron morton
Many of my mental models bother people :)

This particular one came from my understanding of Big Table and the code. 

For me this works, I think of (internal) rows as roughly containing the CF's. 

In the CQL world it works for me as well, the partition key (first part of the 
primary key) is important and identifies the storage container that has the 
columns. 

Your milage may vary
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 4:43 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 That should not bother you.
 
 For example, if your doing an hbase scan that crosses two column families,
 that count end up being two (disk) seeks.
 
 Having an API that hides the seeks from you does not give you better
 performance, it only helps you when your debating with people that do not
 understand the fundamentals.



Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread aaron morton
  I think a row mutation is isolated now, but is it across column families?
Correct they are isolated, but only for an individual CF. 

 By the way, the wiki page really needs updating.
You can update if you would like to. 

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/01/2013, at 12:33 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:
 
  So If I write to CF Users with rowkey=dean
 and to CF Schedules with rowkey=dean, it is actually one row?
 In my mental model that's correct.
 A RowMutation is a row key and a collection of (internal) ColumnFamilies 
 which contain the columns to write for a single CF.
 
 This is the thing that is committed to the log, and then the changes in the 
 ColumnFamilies are applied to each CF in an isolated way.
 
 .(must have missed that several times in the
 documentation).
 http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
 
 If you write to 4 CF's with the same row key that is considered one
 mutation
 
 Hm, I never considered this, never knew either.(very un-intuitive from
 a user perspective IMHO).  So If I write to CF Users with rowkey=dean
 and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
 un-intuitive that I had to ask to make sure I am reading that correctly).
 
 I guess I really don't have that case since most of my row keys are GUID's
 anyways, but very interesting and unexpected (not sure I really mind, was
 just taken aback)
 
 Ps. Not sure I ever minded losting atomic commits to the same row across
 CF's as I never expected it in the first place having used cassandra for
 more than a year.(must have missed that several times in the
 documentation).
 
 Thanks,
 Dean
 
 On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:
 
 
 Another thing that's been confusing me is that when we talk about the
 data model should the row key be inside or outside a column family?
 My mental model is:
 
 cluster == database
 keyspace == table
 row == a row in a table
 CF == a family of columns in one row
 
 (I think that's different to others, but it works for me)
 
 Is it important to store rows of different column families that share
 the same row key to the same node?
 Makes the failure models a little easier to understand. e.g. Everything
 key for user amorton is either available or not.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
 level?
 Other than it's baked in?
 
 We process all mutations for a row at the same time. If you write to 4
 CF's with the same row key that is considered one mutation, for one row.
 That one RowMutation is directed to the replicas using the
 ReplicationStratagy and atomically applied to the commit log.
 
 If you have RS per CF that one mutation would be split into 4, which
 would then be sent to different replicas. Even if they went to the same
 replicas they would be written to the commit log as different mutations.
 
 So if you have RS per CF you lose atomic commits for writes to the same
 row.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage
 engine row key in a KS are on the same nodes. if they were per CF this
 would not hold.
 
 Not that it would be the end of the world, but that is the first thing
 that comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question
 only has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or
 related to internal implementation? Or it's just that I've overlooked
 something?
 
 
 Is it important to store rows of different column families that share
 the same row key to the same node? AFAIK, Cassandra doesn't support get
 all of them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
 level?
 
 Another thing that's been confusing me is that when we talk about the
 data model should the row key be inside or outside a column family?
 
 Thanks
 
 
 
 
 
 From that wiki page, mutations against a single key are atomic but not 
 isolated. I think a row 

Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread Manu Zhang

On Thu 31 Jan 2013 08:55:40 AM CST, aaron morton wrote:

  I think a row mutation is isolated now, but is it across column families?

Correct they are isolated, but only for an individual CF.


By the way, the wiki page really needs updating.

You can update if you would like to.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/01/2013, at 12:33 PM, Manu Zhang owenzhang1...@gmail.com wrote:


On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:



  So If I write to CF Users with rowkey=dean
and to CF Schedules with rowkey=dean, it is actually one row?

In my mental model that's correct.
A RowMutation is a row key and a collection of (internal) ColumnFamilies which 
contain the columns to write for a single CF.

This is the thing that is committed to the log, and then the changes in the 
ColumnFamilies are applied to each CF in an isolated way.


.(must have missed that several times in the
documentation).

http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote:


If you write to 4 CF's with the same row key that is considered one
mutation

Hm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey=dean
and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:



Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)


Is it important to store rows of different column families that share
the same row key to the same node?

Makes the failure models a little easier to understand. e.g. Everything
key for user amorton is either available or not.


Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4
CF's with the same row key that is considered one mutation, for one row.
That one RowMutation is directed to the replicas using the
ReplicationStratagy and atomically applied to the commit log.

If you have RS per CF that one mutation would be split into 4, which
would then be sent to different replicas. Even if they went to the same
replicas they would be written to the commit log as different mutations.

So if you have RS per CF you lose atomic commits for writes to the same
row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:


On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:

The row is the unit of replication, all values with the same storage
engine row key in a KS are on the same nodes. if they were per CF this
would not hold.

Not that it would be the end of the world, but that is the first thing
that comes to mind.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:


Although I've got to know Cassandra for quite a while, this question
only has occurred to me recently:

Why are the replica placement strategy and replica factors set at the
keyspace level?

Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or
related to internal implementation? Or it's just that I've overlooked
something?




Is it important to store rows of different column families that share
the same row key to the same node? AFAIK, Cassandra doesn't support get
all of them in a single call.

Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

Thanks









 From that wiki page, mutations against a single key are atomic but not 
isolated. I think a row mutation is isolated now, but is it across column families? 
By 

Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread Edward Capriolo
That should not bother you.

For example, if your doing an hbase scan that crosses two column families,
that count end up being two (disk) seeks.

Having an API that hides the seeks from you does not give you better
performance, it only helps you when your debating with people that do not
understand the fundamentals.


Re: why set replica placement strategy at keyspace level ?

2013-01-29 Thread Manu Zhang

On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:



  So If I write to CF Users with rowkey=dean
and to CF Schedules with rowkey=dean, it is actually one row?

In my mental model that's correct.
A RowMutation is a row key and a collection of (internal) ColumnFamilies which 
contain the columns to write for a single CF.

This is the thing that is committed to the log, and then the changes in the 
ColumnFamilies are applied to each CF in an isolated way.


.(must have missed that several times in the
documentation).

http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote:


If you write to 4 CF's with the same row key that is considered one
mutation

Hm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey=dean
and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:



Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)


Is it important to store rows of different column families that share
the same row key to the same node?

Makes the failure models a little easier to understand. e.g. Everything
key for user amorton is either available or not.


Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4
CF's with the same row key that is considered one mutation, for one row.
That one RowMutation is directed to the replicas using the
ReplicationStratagy and atomically applied to the commit log.

If you have RS per CF that one mutation would be split into 4, which
would then be sent to different replicas. Even if they went to the same
replicas they would be written to the commit log as different mutations.

So if you have RS per CF you lose atomic commits for writes to the same
row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:


On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:

The row is the unit of replication, all values with the same storage
engine row key in a KS are on the same nodes. if they were per CF this
would not hold.

Not that it would be the end of the world, but that is the first thing
that comes to mind.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:


Although I've got to know Cassandra for quite a while, this question
only has occurred to me recently:

Why are the replica placement strategy and replica factors set at the
keyspace level?

Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or
related to internal implementation? Or it's just that I've overlooked
something?




Is it important to store rows of different column families that share
the same row key to the same node? AFAIK, Cassandra doesn't support get
all of them in a single call.

Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

Thanks









From that wiki page, mutations against a single key are atomic but not 
isolated. I think a row mutation is isolated now, but is it across 
column families? By the way, the wiki page really needs updating.


Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread aaron morton
The row is the unit of replication, all values with the same storage engine row 
key in a KS are on the same nodes. if they were per CF this would not hold. 

Not that it would be the end of the world, but that is the first thing that 
comes to mind. 

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 Although I've got to know Cassandra for quite a while, this question only has 
 occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the 
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or related to 
 internal implementation? Or it's just that I've overlooked something?



Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread Manu Zhang

On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:

The row is the unit of replication, all values with the same storage engine row 
key in a KS are on the same nodes. if they were per CF this would not hold.

Not that it would be the end of the world, but that is the first thing that 
comes to mind.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:


Although I've got to know Cassandra for quite a while, this question only has 
occurred to me recently:

Why are the replica placement strategy and replica factors set at the keyspace 
level?

Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or related to 
internal implementation? Or it's just that I've overlooked something?




Is it important to store rows of different column families that share 
the same row key to the same node? AFAIK, Cassandra doesn't support get 
all of them in a single call.


Meanwhile, what's the drawback of setting RPS and RF at column family 
level?


Another thing that's been confusing me is that when we talk about the 
data model should the row key be inside or outside a column family?


Thanks



Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread aaron morton
 
 Another thing that's been confusing me is that when we talk about the data 
 model should the row key be inside or outside a column family?
My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)

 Is it important to store rows of different column families that share the 
 same row key to the same node?
Makes the failure models a little easier to understand. e.g. Everything key for 
user amorton is either available or not. 

 Meanwhile, what's the drawback of setting RPS and RF at column family level?
Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4 CF's 
with the same row key that is considered one mutation, for one row. That one 
RowMutation is directed to the replicas using the ReplicationStratagy and 
atomically applied to the commit log. 

If you have RS per CF that one mutation would be split into 4, which would then 
be sent to different replicas. Even if they went to the same replicas they 
would be written to the commit log as different mutations. 

So if you have RS per CF you lose atomic commits for writes to the same row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage engine 
 row key in a KS are on the same nodes. if they were per CF this would not 
 hold.
 
 Not that it would be the end of the world, but that is the first thing that 
 comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question only 
 has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the 
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or related 
 to internal implementation? Or it's just that I've overlooked something?
 
 
 Is it important to store rows of different column families that share the 
 same row key to the same node? AFAIK, Cassandra doesn't support get all of 
 them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family level?
 
 Another thing that's been confusing me is that when we talk about the data 
 model should the row key be inside or outside a column family?
 
 Thanks
 



Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread Hiller, Dean
If you write to 4 CF's with the same row key that is considered one
mutation

Hm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey=dean
and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:

 
 Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?
My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)

 Is it important to store rows of different column families that share
the same row key to the same node?
Makes the failure models a little easier to understand. e.g. Everything
key for user amorton is either available or not.

 Meanwhile, what's the drawback of setting RPS and RF at column family
level?
Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4
CF's with the same row key that is considered one mutation, for one row.
That one RowMutation is directed to the replicas using the
ReplicationStratagy and atomically applied to the commit log.

If you have RS per CF that one mutation would be split into 4, which
would then be sent to different replicas. Even if they went to the same
replicas they would be written to the commit log as different mutations.

So if you have RS per CF you lose atomic commits for writes to the same
row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage
engine row key in a KS are on the same nodes. if they were per CF this
would not hold.
 
 Not that it would be the end of the world, but that is the first thing
that comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question
only has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the
keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or
related to internal implementation? Or it's just that I've overlooked
something?
 
 
 Is it important to store rows of different column families that share
the same row key to the same node? AFAIK, Cassandra doesn't support get
all of them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
level?
 
 Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?
 
 Thanks
 




Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread aaron morton

  So If I write to CF Users with rowkey=dean
 and to CF Schedules with rowkey=dean, it is actually one row?
In my mental model that's correct. 
A RowMutation is a row key and a collection of (internal) ColumnFamilies which 
contain the columns to write for a single CF. 

This is the thing that is committed to the log, and then the changes in the 
ColumnFamilies are applied to each CF in an isolated way. 

 .(must have missed that several times in the
 documentation).
http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 If you write to 4 CF's with the same row key that is considered one
 mutation
 
 Hm, I never considered this, never knew either.(very un-intuitive from
 a user perspective IMHO).  So If I write to CF Users with rowkey=dean
 and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
 un-intuitive that I had to ask to make sure I am reading that correctly).
 
 I guess I really don't have that case since most of my row keys are GUID's
 anyways, but very interesting and unexpected (not sure I really mind, was
 just taken aback)
 
 Ps. Not sure I ever minded losting atomic commits to the same row across
 CF's as I never expected it in the first place having used cassandra for
 more than a year.(must have missed that several times in the
 documentation).
 
 Thanks,
 Dean
 
 On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:
 
 
 Another thing that's been confusing me is that when we talk about the
 data model should the row key be inside or outside a column family?
 My mental model is:
 
 cluster == database
 keyspace == table
 row == a row in a table
 CF == a family of columns in one row
 
 (I think that's different to others, but it works for me)
 
 Is it important to store rows of different column families that share
 the same row key to the same node?
 Makes the failure models a little easier to understand. e.g. Everything
 key for user amorton is either available or not.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
 level?
 Other than it's baked in?
 
 We process all mutations for a row at the same time. If you write to 4
 CF's with the same row key that is considered one mutation, for one row.
 That one RowMutation is directed to the replicas using the
 ReplicationStratagy and atomically applied to the commit log.
 
 If you have RS per CF that one mutation would be split into 4, which
 would then be sent to different replicas. Even if they went to the same
 replicas they would be written to the commit log as different mutations.
 
 So if you have RS per CF you lose atomic commits for writes to the same
 row.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage
 engine row key in a KS are on the same nodes. if they were per CF this
 would not hold.
 
 Not that it would be the end of the world, but that is the first thing
 that comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question
 only has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or
 related to internal implementation? Or it's just that I've overlooked
 something?
 
 
 Is it important to store rows of different column families that share
 the same row key to the same node? AFAIK, Cassandra doesn't support get
 all of them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
 level?
 
 Another thing that's been confusing me is that when we talk about the
 data model should the row key be inside or outside a column family?
 
 Thanks
 
 
 



why set replica placement strategy at keyspace level ?

2013-01-26 Thread Manu Zhang
Although I've got to know Cassandra for quite a while, this question 
only has occurred to me recently:


Why are the replica placement strategy and replica factors set at the 
keyspace level?


Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or 
related to internal implementation? Or it's just that I've overlooked 
something?