subject:"Re\: Using Per\-Table Keyspaces for Tunable Replication"

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Ryan Svihla

Clarification keyspace for each should be keyspace for cassandra tables
and solr tables

On Fri, Dec 12, 2014 at 11:25 AM, Ryan Svihla rsvi...@datastax.com wrote:

 It would make more sense to just have a keyspace for each. Something like
 solr_tables, and cassandra_tables. I've done similar with most customers
 using DSE search (not a DSE mailing list, but the information is
 interesting background for your question).

 there is a cost to each keyspace and you'll hit a level where the cost of
 managing each keyspace gets expensive for your total heap usage (your
 mileage may vary on lots of factors.). Breaking up keyspaces into logical
 replication groups makes the most sense from a maintainability and
 performance standpoint.

 On Fri, Dec 12, 2014 at 11:21 AM, Eric Stevens migh...@gmail.com wrote:

 We're considering moving to a model where we put each of our tables in a
 dedicated keyspace.  This is so we can tune replication per table, and
 change our mind about that replication on a per-table basis without a major
 migration.  The biggest driver for this is Solr integration, we want to
 tune RF into our Solr DC such that only tables which we want to search are
 sent there (using NetworkTopologyStrategy with 'solr': 0 for tables which
 are not searchable).

 Has anyone else tried this, is there any reason we might not want to do
 so?  Any hidden gotchas we should be concerned about?  Our total table
 count is small, in the tens range; our searchable tables are maybe 4 or 5.



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Ryan Svihla

It would make more sense to just have a keyspace for each. Something like
solr_tables, and cassandra_tables. I've done similar with most customers
using DSE search (not a DSE mailing list, but the information is
interesting background for your question).

there is a cost to each keyspace and you'll hit a level where the cost of
managing each keyspace gets expensive for your total heap usage (your
mileage may vary on lots of factors.). Breaking up keyspaces into logical
replication groups makes the most sense from a maintainability and
performance standpoint.

On Fri, Dec 12, 2014 at 11:21 AM, Eric Stevens migh...@gmail.com wrote:

 We're considering moving to a model where we put each of our tables in a
 dedicated keyspace.  This is so we can tune replication per table, and
 change our mind about that replication on a per-table basis without a major
 migration.  The biggest driver for this is Solr integration, we want to
 tune RF into our Solr DC such that only tables which we want to search are
 sent there (using NetworkTopologyStrategy with 'solr': 0 for tables which
 are not searchable).

 Has anyone else tried this, is there any reason we might not want to do
 so?  Any hidden gotchas we should be concerned about?  Our total table
 count is small, in the tens range; our searchable tables are maybe 4 or 5.



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Eric Stevens

Well we started with the thought that we'd have two keyspaces, one for
searchables and one for non-searchables like you mentioned. But our
concern is that we may change our mind about what column families are
available for search in the future. Separate keyspaces per table give us
greater flexibility in that regard.

I know that Thrift includes keyspace as part of the connection details, so
if you're reading or writing to many keyspaces, you'll end up having to
make a lot of additional round trips, and it will hurt your throughput. I
may be wrong, but I don't think this is true for the native protocol. If
we're using fully qualified names for all of our queries, I don't think
this incurs the same overhead.

I've had a look through the DataStax Java Driver's execution path and I'm
seeing that it attempts to discover the keyspace used by each query, but
that's to help determine the candidate hosts for token aware policy. It
does that discovery at the time the session is initted (see Metadata.java
http://grepcode.com/file/repo1.maven.org/maven2/com.datastax.cassandra/cassandra-driver-core/2.1.2/com/datastax/driver/core/Metadata.java/#381)
as well as when a topology change is detected, so it seems like it may
slightly slow down connect time, but the cost per query at execution time
should be relatively static regardless of the number of keyspaces.

I know there is nontrivial overhead for each column family, but I have not
read or heard that there is nontrivial overhead for each keyspace. Do you
have more information about that?

On Fri, Dec 12, 2014 at 10:26 AM, Ryan Svihla rsvi...@datastax.com wrote:

Clarification keyspace for each should be keyspace for cassandra tables
and solr tables

On Fri, Dec 12, 2014 at 11:25 AM, Ryan Svihla rsvi...@datastax.com
wrote:

It would make more sense to just have a keyspace for each. Something like
solr_tables, and cassandra_tables. I've done similar with most customers
using DSE search (not a DSE mailing list, but the information is
interesting background for your question).

there is a cost to each keyspace and you'll hit a level where the cost of
managing each keyspace gets expensive for your total heap usage (your
mileage may vary on lots of factors.). Breaking up keyspaces into logical
replication groups makes the most sense from a maintainability and
performance standpoint.

On Fri, Dec 12, 2014 at 11:21 AM, Eric Stevens migh...@gmail.com wrote:

We're considering moving to a model where we put each of our tables in a
dedicated keyspace. This is so we can tune replication per table, and
change our mind about that replication on a per-table basis without a major
migration. The biggest driver for this is Solr integration, we want to
tune RF into our Solr DC such that only tables which we want to search are
sent there (using NetworkTopologyStrategy with 'solr': 0 for tables which
are not searchable).

Has anyone else tried this, is there any reason we might not want to do
so? Any hidden gotchas we should be concerned about? Our total table
count is small, in the tens range; our searchable tables are maybe 4 or 5.

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Tyler Hobbs

On Fri, Dec 12, 2014 at 4:50 PM, Eric Stevens migh...@gmail.com wrote:

That's correct. While you can set a default keyspace for a native protocol
connection, the ability to use fully qualified names makes this not matter
in the same way that it did for Thrift.

This is also correct. On startup the driver will build a token ring (or
replica map) representation for each keyspace to assist TokenAwarePolicy.
There's no additional overhead per-query for extra keyspaces.

I know there is nontrivial overhead for each column family, but I have not
read or heard that there is nontrivial overhead for each keyspace. Do you
have more information about that?

The overhead for each keyspace is minor. There will be some additional
objects in the heap, some more entries in the system tables, and the driver
will generally track more metadata, but that's all pretty lightweight.

The per-column family overhead primarily comes from the way memory is
allocated for memtables. However, CASSANDRA-7882 should significantly
improve that: https://issues.apache.org/jira/browse/CASSANDRA-7882

--
Tyler Hobbs
DataStax http://datastax.com/

Re: Using Per-Table Keyspaces for Tunable Replication

Re: Using Per-Table Keyspaces for Tunable Replication

Re: Using Per-Table Keyspaces for Tunable Replication

Re: Using Per-Table Keyspaces for Tunable Replication

4 matches

Site Navigation

Mail list logo

Footer information