How expensive are additional keyspaces?

2014-03-11 Thread Martin Meyer
Hey all -

My company is working on introducing a configuration service system to
provide cofig data to several of our applications, to be backed by
Cassandra. We're already using Cassandra for other services, and at
the moment our pending design just puts all the new tables (9 of them,
I believe) in one of our pre-existing keyspaces.

I've got a few questions about keyspaces that I'm hoping for input on.
Some Google hunting didn't turn up obvious answers, at least not for
recent versions of Cassandra.

1) What trade offs are being made by using a new keyspace versus
re-purposing an existing one (that is in active use by another
application)? Organization is the obvious answer, I'm looking for any
technical reasons.

2) Is there any per-keyspace overhead incurred by the cluster?

3) Does it impact on-disk layout at all for tables to be in a
different keyspace from others? Is any sort of file fragmentation
potentially introduced just by doing this in a new keyspace as opposed
to an exiting one?

4) Does it add any metadata overhead to the system keyspace?

5) Why might we *not* want to make a separate keyspace for this?

6) Does anyone have experience with creating additional keyspaces to
the point that Cassandra can no longer handle it? Note that we're
*not* planning to do this, I'm just curious.

Cheers,
Martin


Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
The biggest expense of them is that you need to be authenticated to a
keyspace to perform and operation. Thus connection pools are bound to
keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
If you have 100 keyspaces you need 100 connection pools that starts to be a
pain very quickly.

I suggest keeping everything in one keyspace unless you really need
different replication factors and or network replication settings per
keyspace.


On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.comwrote:

 Hey all -

 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.

 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.

 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.

 2) Is there any per-keyspace overhead incurred by the cluster?

 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?

 4) Does it add any metadata overhead to the system keyspace?

 5) Why might we *not* want to make a separate keyspace for this?

 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.

 Cheers,
 Martin



Re: How expensive are additional keyspaces?

2014-03-11 Thread Keith Wright
Does this whole true for the native protocol?  I’ve noticed that you can create 
a session object in the datastax driver without specifying a keyspace and so 
long as you include the keyspace in all queries instead of just table name, it 
works fine.  In that case, I assume there’s only one connection pool for all 
keyspaces.

From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, March 11, 2014 at 11:05 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: How expensive are additional keyspaces?

The biggest expense of them is that you need to be authenticated to a keyspace 
to perform and operation. Thus connection pools are bound to keyspaces. 
Switching a keyspace is an RPC operation. In the thrift client, If you have 100 
keyspaces you need 100 connection pools that starts to be a pain very quickly.

I suggest keeping everything in one keyspace unless you really need different 
replication factors and or network replication settings per keyspace.


On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer 
elreydet...@gmail.commailto:elreydet...@gmail.com wrote:
Hey all -

My company is working on introducing a configuration service system to
provide cofig data to several of our applications, to be backed by
Cassandra. We're already using Cassandra for other services, and at
the moment our pending design just puts all the new tables (9 of them,
I believe) in one of our pre-existing keyspaces.

I've got a few questions about keyspaces that I'm hoping for input on.
Some Google hunting didn't turn up obvious answers, at least not for
recent versions of Cassandra.

1) What trade offs are being made by using a new keyspace versus
re-purposing an existing one (that is in active use by another
application)? Organization is the obvious answer, I'm looking for any
technical reasons.

2) Is there any per-keyspace overhead incurred by the cluster?

3) Does it impact on-disk layout at all for tables to be in a
different keyspace from others? Is any sort of file fragmentation
potentially introduced just by doing this in a new keyspace as opposed
to an exiting one?

4) Does it add any metadata overhead to the system keyspace?

5) Why might we *not* want to make a separate keyspace for this?

6) Does anyone have experience with creating additional keyspaces to
the point that Cassandra can no longer handle it? Note that we're
*not* planning to do this, I'm just curious.

Cheers,
Martin



Re: How expensive are additional keyspaces?

2014-03-11 Thread Jeremiah D Jordan
The use of more than one keyspace is not uncommon.  Using 100's of them is.  
That being said, different keyspaces let you specify different replication and 
different authentication.  If you are not going to be doing one of those 
things, then there really is no point to multiple keyspaces.  If you do want to 
do one of those things, then go for it, make multiple keyspaces.


-Jeremiah

On Mar 11, 2014, at 10:17 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 I am not sure. As stated the only benefit of multiple keyspaces is if you 
 need:
  
 1) different replication per keyspace
 2) different multiple data center configurations per keyspace
 
 Unless you have one of these cases you do not need to do this. I would always 
 tackle this problem at the application level using something like:
 
 http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
 
 Client issues aside, it is not a very common case and I would advice against 
 uncommon set ups.
 
 
 
 On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright kwri...@nanigans.com wrote:
 Does this whole true for the native protocol?  I’ve noticed that you can 
 create a session object in the datastax driver without specifying a keyspace 
 and so long as you include the keyspace in all queries instead of just table 
 name, it works fine.  In that case, I assume there’s only one connection pool 
 for all keyspaces.
 
 From: Edward Capriolo edlinuxg...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 11, 2014 at 11:05 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How expensive are additional keyspaces?
 
 The biggest expense of them is that you need to be authenticated to a 
 keyspace to perform and operation. Thus connection pools are bound to 
 keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If 
 you have 100 keyspaces you need 100 connection pools that starts to be a pain 
 very quickly. 
 
 I suggest keeping everything in one keyspace unless you really need different 
 replication factors and or network replication settings per keyspace.
 
 
 On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com wrote:
 Hey all -
 
 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.
 
 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.
 
 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.
 
 2) Is there any per-keyspace overhead incurred by the cluster?
 
 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?
 
 4) Does it add any metadata overhead to the system keyspace?
 
 5) Why might we *not* want to make a separate keyspace for this?
 
 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.
 
 Cheers,
 Martin
 
 



Re: How expensive are additional keyspaces?

2014-03-11 Thread Jeremiah D Jordan
Also, in terms of overhead, server side the overhead is pretty much all at the 
Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the same as 1 
keyspace, 100 CF's.

-Jeremiah

On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan jeremiah.jor...@gmail.com 
wrote:

 The use of more than one keyspace is not uncommon.  Using 100's of them is.  
 That being said, different keyspaces let you specify different replication 
 and different authentication.  If you are not going to be doing one of those 
 things, then there really is no point to multiple keyspaces.  If you do want 
 to do one of those things, then go for it, make multiple keyspaces.
 
 
 -Jeremiah
 
 On Mar 11, 2014, at 10:17 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 I am not sure. As stated the only benefit of multiple keyspaces is if you 
 need:
  
 1) different replication per keyspace
 2) different multiple data center configurations per keyspace
 
 Unless you have one of these cases you do not need to do this. I would 
 always tackle this problem at the application level using something like:
 
 http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
 
 Client issues aside, it is not a very common case and I would advice against 
 uncommon set ups.
 
 
 
 On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright kwri...@nanigans.com wrote:
 Does this whole true for the native protocol?  I’ve noticed that you can 
 create a session object in the datastax driver without specifying a keyspace 
 and so long as you include the keyspace in all queries instead of just table 
 name, it works fine.  In that case, I assume there’s only one connection 
 pool for all keyspaces.
 
 From: Edward Capriolo edlinuxg...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 11, 2014 at 11:05 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How expensive are additional keyspaces?
 
 The biggest expense of them is that you need to be authenticated to a 
 keyspace to perform and operation. Thus connection pools are bound to 
 keyspaces. Switching a keyspace is an RPC operation. In the thrift client, 
 If you have 100 keyspaces you need 100 connection pools that starts to be a 
 pain very quickly. 
 
 I suggest keeping everything in one keyspace unless you really need 
 different replication factors and or network replication settings per 
 keyspace.
 
 
 On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com wrote:
 Hey all -
 
 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.
 
 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.
 
 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.
 
 2) Is there any per-keyspace overhead incurred by the cluster?
 
 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?
 
 4) Does it add any metadata overhead to the system keyspace?
 
 5) Why might we *not* want to make a separate keyspace for this?
 
 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.
 
 Cheers,
 Martin
 
 
 



Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
The mathematical overhead is one thing. I would guess if you tried some
design with 10,000 keyspaces and then you ran into a bug/performance
problem the first thing someone would say to you is WTF do you have that
many keyspaces :) Don't let that be you.


On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan 
jeremiah.jor...@gmail.com wrote:

 Also, in terms of overhead, server side the overhead is pretty much all at
 the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
 same as 1 keyspace, 100 CF's.

 -Jeremiah

 On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan jeremiah.jor...@gmail.com
 wrote:

 The use of more than one keyspace is not uncommon.  Using 100's of them
 is.  That being said, different keyspaces let you specify different
 replication and different authentication.  If you are not going to be doing
 one of those things, then there really is no point to multiple keyspaces.
  If you do want to do one of those things, then go for it, make multiple
 keyspaces.


 -Jeremiah

 On Mar 11, 2014, at 10:17 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 I am not sure. As stated the only benefit of multiple keyspaces is if you
 need:

 1) different replication per keyspace
 2) different multiple data center configurations per keyspace

 Unless you have one of these cases you do not need to do this. I would
 always tackle this problem at the application level using something like:


 http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html

 Client issues aside, it is not a very common case and I would advice
 against uncommon set ups.



 On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright kwri...@nanigans.comwrote:

 Does this whole true for the native protocol?  I've noticed that you can
 create a session object in the datastax driver without specifying a
 keyspace and so long as you include the keyspace in all queries instead of
 just table name, it works fine.  In that case, I assume there's only one
 connection pool for all keyspaces.

 From: Edward Capriolo edlinuxg...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 11, 2014 at 11:05 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How expensive are additional keyspaces?

 The biggest expense of them is that you need to be authenticated to a
 keyspace to perform and operation. Thus connection pools are bound to
 keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
 If you have 100 keyspaces you need 100 connection pools that starts to be a
 pain very quickly.

 I suggest keeping everything in one keyspace unless you really need
 different replication factors and or network replication settings per
 keyspace.


 On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.comwrote:

 Hey all -

 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.

 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.

 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.

 2) Is there any per-keyspace overhead incurred by the cluster?

 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?

 4) Does it add any metadata overhead to the system keyspace?

 5) Why might we *not* want to make a separate keyspace for this?

 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.

 Cheers,
 Martin








Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
This mistake is not a thrift limitation. In 0.6.X you could switch
keyspaces without calling setKeyspace(String) methods specified the
keyspace in every operation. This is mirrors the StorageProxy class. In
0.7.X setKeyspace() was created and the keyspace was removed from all these
thrift methods. I really dislike that change personally :)

If someone was so motivated, they could pretty easily (a couple days work)
add new methods to thrift that do not have this limitation.




On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis jbel...@gmail.com wrote:

 That is correct.  Another place where the mistakes of Thrift informed
 our development of the native protocol.

 On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright kwri...@nanigans.com
 wrote:
  Does this whole true for the native protocol?  I've noticed that you can
  create a session object in the datastax driver without specifying a
 keyspace
  and so long as you include the keyspace in all queries instead of just
 table
  name, it works fine.  In that case, I assume there's only one connection
  pool for all keyspaces.
 
  From: Edward Capriolo edlinuxg...@gmail.com
  Reply-To: user@cassandra.apache.org user@cassandra.apache.org
  Date: Tuesday, March 11, 2014 at 11:05 AM
  To: user@cassandra.apache.org user@cassandra.apache.org
  Subject: Re: How expensive are additional keyspaces?
 
  The biggest expense of them is that you need to be authenticated to a
  keyspace to perform and operation. Thus connection pools are bound to
  keyspaces. Switching a keyspace is an RPC operation. In the thrift
 client,
  If you have 100 keyspaces you need 100 connection pools that starts to
 be a
  pain very quickly.
 
  I suggest keeping everything in one keyspace unless you really need
  different replication factors and or network replication settings per
  keyspace.
 
 
  On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com
  wrote:
 
  Hey all -
 
  My company is working on introducing a configuration service system to
  provide cofig data to several of our applications, to be backed by
  Cassandra. We're already using Cassandra for other services, and at
  the moment our pending design just puts all the new tables (9 of them,
  I believe) in one of our pre-existing keyspaces.
 
  I've got a few questions about keyspaces that I'm hoping for input on.
  Some Google hunting didn't turn up obvious answers, at least not for
  recent versions of Cassandra.
 
  1) What trade offs are being made by using a new keyspace versus
  re-purposing an existing one (that is in active use by another
  application)? Organization is the obvious answer, I'm looking for any
  technical reasons.
 
  2) Is there any per-keyspace overhead incurred by the cluster?
 
  3) Does it impact on-disk layout at all for tables to be in a
  different keyspace from others? Is any sort of file fragmentation
  potentially introduced just by doing this in a new keyspace as opposed
  to an exiting one?
 
  4) Does it add any metadata overhead to the system keyspace?
 
  5) Why might we *not* want to make a separate keyspace for this?
 
  6) Does anyone have experience with creating additional keyspaces to
  the point that Cassandra can no longer handle it? Note that we're
  *not* planning to do this, I'm just curious.
 
  Cheers,
  Martin
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: How expensive are additional keyspaces?

2014-03-11 Thread Peter Lin
I couldn't resist responding.

Having done some experiments with lots of keyspaces and purposely created
lots of keyspaces versus 1 keyspace, the only good reasons I see for many
keyspaces

1. each keyspaces needs a different replication factor. Even in this case,
I personally can't justify having hundreds of different replication factor
settings. Beyond replication factor of 4, my bias take is the highest
number would be the number of datacenters and 1 for local workstation
development

2. using keyspaces to logically organize schema to support things like
multi-tenant applications

I'm sure there are other valid reasons, but those are the ones that come to
my mind.


On Tue, Mar 11, 2014 at 11:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 The mathematical overhead is one thing. I would guess if you tried some
 design with 10,000 keyspaces and then you ran into a bug/performance
 problem the first thing someone would say to you is WTF do you have that
 many keyspaces :) Don't let that be you.



 On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan 
 jeremiah.jor...@gmail.com wrote:

 Also, in terms of overhead, server side the overhead is pretty much all
 at the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
 same as 1 keyspace, 100 CF's.

 -Jeremiah

 On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan 
 jeremiah.jor...@gmail.com wrote:

 The use of more than one keyspace is not uncommon.  Using 100's of them
 is.  That being said, different keyspaces let you specify different
 replication and different authentication.  If you are not going to be doing
 one of those things, then there really is no point to multiple keyspaces.
  If you do want to do one of those things, then go for it, make multiple
 keyspaces.


 -Jeremiah

 On Mar 11, 2014, at 10:17 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 I am not sure. As stated the only benefit of multiple keyspaces is if you
 need:

 1) different replication per keyspace
 2) different multiple data center configurations per keyspace

 Unless you have one of these cases you do not need to do this. I would
 always tackle this problem at the application level using something like:


 http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html

 Client issues aside, it is not a very common case and I would advice
 against uncommon set ups.



 On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright kwri...@nanigans.comwrote:

 Does this whole true for the native protocol?  I've noticed that you can
 create a session object in the datastax driver without specifying a
 keyspace and so long as you include the keyspace in all queries instead of
 just table name, it works fine.  In that case, I assume there's only one
 connection pool for all keyspaces.

 From: Edward Capriolo edlinuxg...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 11, 2014 at 11:05 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How expensive are additional keyspaces?

 The biggest expense of them is that you need to be authenticated to a
 keyspace to perform and operation. Thus connection pools are bound to
 keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
 If you have 100 keyspaces you need 100 connection pools that starts to be a
 pain very quickly.

 I suggest keeping everything in one keyspace unless you really need
 different replication factors and or network replication settings per
 keyspace.


 On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.comwrote:

 Hey all -

 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.

 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.

 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.

 2) Is there any per-keyspace overhead incurred by the cluster?

 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?

 4) Does it add any metadata overhead to the system keyspace?

 5) Why might we *not* want to make a separate keyspace for this?

 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.

 Cheers,
 Martin









Re: How expensive are additional keyspaces?

2014-03-11 Thread Peter Lin
if I have time this summer, I may work on that, since I like having thrift.


On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 This mistake is not a thrift limitation. In 0.6.X you could switch
 keyspaces without calling setKeyspace(String) methods specified the
 keyspace in every operation. This is mirrors the StorageProxy class. In
 0.7.X setKeyspace() was created and the keyspace was removed from all these
 thrift methods. I really dislike that change personally :)

 If someone was so motivated, they could pretty easily (a couple days work)
 add new methods to thrift that do not have this limitation.




 On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis jbel...@gmail.comwrote:

 That is correct.  Another place where the mistakes of Thrift informed
 our development of the native protocol.

 On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright kwri...@nanigans.com
 wrote:
  Does this whole true for the native protocol?  I've noticed that you can
  create a session object in the datastax driver without specifying a
 keyspace
  and so long as you include the keyspace in all queries instead of just
 table
  name, it works fine.  In that case, I assume there's only one connection
  pool for all keyspaces.
 
  From: Edward Capriolo edlinuxg...@gmail.com
  Reply-To: user@cassandra.apache.org user@cassandra.apache.org
  Date: Tuesday, March 11, 2014 at 11:05 AM
  To: user@cassandra.apache.org user@cassandra.apache.org
  Subject: Re: How expensive are additional keyspaces?
 
  The biggest expense of them is that you need to be authenticated to a
  keyspace to perform and operation. Thus connection pools are bound to
  keyspaces. Switching a keyspace is an RPC operation. In the thrift
 client,
  If you have 100 keyspaces you need 100 connection pools that starts to
 be a
  pain very quickly.
 
  I suggest keeping everything in one keyspace unless you really need
  different replication factors and or network replication settings per
  keyspace.
 
 
  On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com
  wrote:
 
  Hey all -
 
  My company is working on introducing a configuration service system to
  provide cofig data to several of our applications, to be backed by
  Cassandra. We're already using Cassandra for other services, and at
  the moment our pending design just puts all the new tables (9 of them,
  I believe) in one of our pre-existing keyspaces.
 
  I've got a few questions about keyspaces that I'm hoping for input on.
  Some Google hunting didn't turn up obvious answers, at least not for
  recent versions of Cassandra.
 
  1) What trade offs are being made by using a new keyspace versus
  re-purposing an existing one (that is in active use by another
  application)? Organization is the obvious answer, I'm looking for any
  technical reasons.
 
  2) Is there any per-keyspace overhead incurred by the cluster?
 
  3) Does it impact on-disk layout at all for tables to be in a
  different keyspace from others? Is any sort of file fragmentation
  potentially introduced just by doing this in a new keyspace as opposed
  to an exiting one?
 
  4) Does it add any metadata overhead to the system keyspace?
 
  5) Why might we *not* want to make a separate keyspace for this?
 
  6) Does anyone have experience with creating additional keyspaces to
  the point that Cassandra can no longer handle it? Note that we're
  *not* planning to do this, I'm just curious.
 
  Cheers,
  Martin
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced





Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
So in the 0.6.X days a signature of a get looked something like this:

get(String keyspace, ColumnPath cp, String rowkey)

Besides changes form string - ByteBuffer the keyspace was pulled out of
the argument.

I think the better more flexible way to do this would be:

struct GetRequest {
   1: optional keyspace,
   2: required rowkey
   3: optional columnPath
}

get(GetRequest g)

This would put some burden on clients to make builder objects instead of
calling methods, but it would make something easier to evolve I think.

However it is hard for me to justify making a second copy of each method
for this small use case. Otherwise I would take that up.




On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin wool...@gmail.com wrote:


 if I have time this summer, I may work on that, since I like having thrift.


 On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 This mistake is not a thrift limitation. In 0.6.X you could switch
 keyspaces without calling setKeyspace(String) methods specified the
 keyspace in every operation. This is mirrors the StorageProxy class. In
 0.7.X setKeyspace() was created and the keyspace was removed from all these
 thrift methods. I really dislike that change personally :)

 If someone was so motivated, they could pretty easily (a couple days
 work) add new methods to thrift that do not have this limitation.




 On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis jbel...@gmail.comwrote:

 That is correct.  Another place where the mistakes of Thrift informed
 our development of the native protocol.

 On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright kwri...@nanigans.com
 wrote:
  Does this whole true for the native protocol?  I've noticed that you
 can
  create a session object in the datastax driver without specifying a
 keyspace
  and so long as you include the keyspace in all queries instead of just
 table
  name, it works fine.  In that case, I assume there's only one
 connection
  pool for all keyspaces.
 
  From: Edward Capriolo edlinuxg...@gmail.com
  Reply-To: user@cassandra.apache.org user@cassandra.apache.org
  Date: Tuesday, March 11, 2014 at 11:05 AM
  To: user@cassandra.apache.org user@cassandra.apache.org
  Subject: Re: How expensive are additional keyspaces?
 
  The biggest expense of them is that you need to be authenticated to a
  keyspace to perform and operation. Thus connection pools are bound to
  keyspaces. Switching a keyspace is an RPC operation. In the thrift
 client,
  If you have 100 keyspaces you need 100 connection pools that starts to
 be a
  pain very quickly.
 
  I suggest keeping everything in one keyspace unless you really need
  different replication factors and or network replication settings per
  keyspace.
 
 
  On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com
  wrote:
 
  Hey all -
 
  My company is working on introducing a configuration service system to
  provide cofig data to several of our applications, to be backed by
  Cassandra. We're already using Cassandra for other services, and at
  the moment our pending design just puts all the new tables (9 of them,
  I believe) in one of our pre-existing keyspaces.
 
  I've got a few questions about keyspaces that I'm hoping for input on.
  Some Google hunting didn't turn up obvious answers, at least not for
  recent versions of Cassandra.
 
  1) What trade offs are being made by using a new keyspace versus
  re-purposing an existing one (that is in active use by another
  application)? Organization is the obvious answer, I'm looking for any
  technical reasons.
 
  2) Is there any per-keyspace overhead incurred by the cluster?
 
  3) Does it impact on-disk layout at all for tables to be in a
  different keyspace from others? Is any sort of file fragmentation
  potentially introduced just by doing this in a new keyspace as opposed
  to an exiting one?
 
  4) Does it add any metadata overhead to the system keyspace?
 
  5) Why might we *not* want to make a separate keyspace for this?
 
  6) Does anyone have experience with creating additional keyspaces to
  the point that Cassandra can no longer handle it? Note that we're
  *not* planning to do this, I'm just curious.
 
  Cheers,
  Martin
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced