Well we started with the thought that we'd have two keyspaces, one for
searchables and one for non-searchables like you mentioned.  But our
concern is that we may change our mind about what column families are
available for search in the future.  Separate keyspaces per table give us
greater flexibility in that regard.

I know that Thrift includes keyspace as part of the connection details, so
if you're reading or writing to many keyspaces, you'll end up having to
make a lot of additional round trips, and it will hurt your throughput.  I
may be wrong, but I don't think this is true for the native protocol.  If
we're using fully qualified names for all of our queries, I don't think
this incurs the same overhead.

I've had a look through the DataStax Java Driver's execution path and I'm
seeing that it attempts to discover the keyspace used by each query, but
that's to help determine the candidate hosts for token aware policy.  It
does that discovery at the time the session is initted (see Metadata.java
<http://grepcode.com/file/repo1.maven.org/maven2/com.datastax.cassandra/cassandra-driver-core/2.1.2/com/datastax/driver/core/Metadata.java/#381>)
as well as when a topology change is detected, so it seems like it may
slightly slow down connect time, but the cost per query at execution time
should be relatively static regardless of the number of keyspaces.

I know there is nontrivial overhead for each column family, but I have not
read or heard that there is nontrivial overhead for each keyspace.  Do you
have more information about that?


On Fri, Dec 12, 2014 at 10:26 AM, Ryan Svihla <rsvi...@datastax.com> wrote:

> Clarification "keyspace for each" should be "keyspace for cassandra tables
> and solr tables"
>
> On Fri, Dec 12, 2014 at 11:25 AM, Ryan Svihla <rsvi...@datastax.com>
> wrote:
>>
>> It would make more sense to just have a keyspace for each. Something like
>> solr_tables, and cassandra_tables. I've done similar with most customers
>> using DSE search (not a DSE mailing list, but the information is
>> interesting background for your question).
>>
>> there is a cost to each keyspace and you'll hit a level where the cost of
>> managing each keyspace gets expensive for your total heap usage (your
>> mileage may vary on lots of factors.). Breaking up keyspaces into logical
>> replication groups makes the most sense from a maintainability and
>> performance standpoint.
>>
>> On Fri, Dec 12, 2014 at 11:21 AM, Eric Stevens <migh...@gmail.com> wrote:
>>>
>>> We're considering moving to a model where we put each of our tables in a
>>> dedicated keyspace.  This is so we can tune replication per table, and
>>> change our mind about that replication on a per-table basis without a major
>>> migration.  The biggest driver for this is Solr integration, we want to
>>> tune RF into our Solr DC such that only tables which we want to search are
>>> sent there (using NetworkTopologyStrategy with 'solr': 0 for tables which
>>> are not searchable).
>>>
>>> Has anyone else tried this, is there any reason we might not want to do
>>> so?  Any hidden gotchas we should be concerned about?  Our total table
>>> count is small, in the tens range; our searchable tables are maybe 4 or 5.
>>>
>>
>>
>> --
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>

Reply via email to