Re: various Cassandra performance problems when CQL3 is really used
I don't know. How do I find out? The only mention about query plan in Cassandra I found is your article on your site, from 2011 and considering version 0.8. See the help for TRACE in cqlsh My general approach is to solve problems with the read path by making changes to the write path. So I would normally say make a new table to store the data you want to read, or change the layout of a table to me more flexible. Can you provide the table definition and the query you are using ? Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 15/01/2014, at 9:48 am, Ondřej Černoš cern...@gmail.com wrote: Hi, thanks for the answer and sorry for the delay. Let me answer inline. On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton aa...@thelastpickle.com wrote: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. Yes, this is why you have to add allow_filtering. You are asking the nodes to read all the data that matches and filter in memory, that’s a SQL type operation. Your example query is somewhat complex and I doubt it could get decent performance, what does the query plan look like? I don't know. How do I find out? The only mention about query plan in Cassandra I found is your article on your site, from 2011 and considering version 0.8. The example query gets computed in a fraction of the time if I perform just the fetch of all rows matching the token function and perform the filtering client side. IMHO you need to do further de-normalisation, you will get the best performance when you select rows by their full or part primary key. I denormalize all the way I can. The problem is I need to support paging and filtering at the same time. The API I must support allows filtering by example and paging - so how should I denormalize? Should I somehow manage pages of primary row keys manually? Or should I have manual secondary index and page somehow in the denormalized wide row? The trouble goes even further, even this doesn't perform well: select id from table where token(id) token(some_value) and pk_cluster = 'val' limit N; where id and pk_cluster are primary key (CQL3 table). I guess this should be ordered row query and ordered column slice query, so where is the problem with performance? By the way, the performance is order of magnitude better if this patch is applied: That looks like it’s tuned to your specific need, it would ignore the max results included in the query It is tuned, it only demonstrates the heuristics doesn't work well. * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. Select all rows from a table requires a range scan, which reads all rows from all nodes. It should never be used production. The trouble is I just need to perform it, sometimes. I know what the problem with the query is, but I have just a couple of thousands records - 150.000 - the datasets can all be stored in memory, SSTables can be fully mmapped. There is no reason for this query to be slow in this case. Not sure what you mean by “scans all columns from all rows” a select by column name will use a SliceByNamesReadCommand which will only read the required columns from each SSTable (it normally short circuits though and read from less). The query should fetch only IDs, it checks TTLs of columns though. That is the point. Why does it do it? if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if there is no TTL it will no be checked. It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. I didn't do anything with TTL on the table and it's columns. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. This is not true, could you provide an example where you think this is happening ? We saw it in the trace log. It happened in the select ID from table query. The table had a collection column. Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major
Re: various Cassandra performance problems when CQL3 is really used
Hi, by the way, some of the issues are summarised here: https://issues.apache.org/jira/browse/CASSANDRA-6586 and here: https://issues.apache.org/jira/browse/CASSANDRA-6587. regards, ondrej cernos On Tue, Jan 14, 2014 at 9:48 PM, Ondřej Černoš cern...@gmail.com wrote: Hi, thanks for the answer and sorry for the delay. Let me answer inline. On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton aa...@thelastpickle.comwrote: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. Yes, this is why you have to add allow_filtering. You are asking the nodes to read all the data that matches and filter in memory, that’s a SQL type operation. Your example query is somewhat complex and I doubt it could get decent performance, what does the query plan look like? I don't know. How do I find out? The only mention about query plan in Cassandra I found is your article on your site, from 2011 and considering version 0.8. The example query gets computed in a fraction of the time if I perform just the fetch of all rows matching the token function and perform the filtering client side. IMHO you need to do further de-normalisation, you will get the best performance when you select rows by their full or part primary key. I denormalize all the way I can. The problem is I need to support paging and filtering at the same time. The API I must support allows filtering by example and paging - so how should I denormalize? Should I somehow manage pages of primary row keys manually? Or should I have manual secondary index and page somehow in the denormalized wide row? The trouble goes even further, even this doesn't perform well: select id from table where token(id) token(some_value) and pk_cluster = 'val' limit N; where id and pk_cluster are primary key (CQL3 table). I guess this should be ordered row query and ordered column slice query, so where is the problem with performance? By the way, the performance is order of magnitude better if this patch is applied: That looks like it’s tuned to your specific need, it would ignore the max results included in the query It is tuned, it only demonstrates the heuristics doesn't work well. * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. Select all rows from a table requires a range scan, which reads all rows from all nodes. It should never be used production. The trouble is I just need to perform it, sometimes. I know what the problem with the query is, but I have just a couple of thousands records - 150.000 - the datasets can all be stored in memory, SSTables can be fully mmapped. There is no reason for this query to be slow in this case. Not sure what you mean by “scans all columns from all rows” a select by column name will use a SliceByNamesReadCommand which will only read the required columns from each SSTable (it normally short circuits though and read from less). The query should fetch only IDs, it checks TTLs of columns though. That is the point. Why does it do it? if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if there is no TTL it will no be checked. It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. I didn't do anything with TTL on the table and it's columns. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. This is not true, could you provide an example where you think this is happening ? We saw it in the trace log. It happened in the select ID from table query. The table had a collection column. Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major compaction. What was the HW platform and what was the load ? My I7/8GB notebook, single node cluster, and virtualised AWS like environment, on nodes of various sizes. Typically freezes in the server correlate to JVM GC, the JVM GC can also be using the CPU. If you have wide rows or make large reads you may run into more JVM GC issues. nodetool flush will (as it says) flush all the tables to disk, if you have a lot tables and/or a lot of secondary indexes this can cause the switch lock to be held
Re: various Cassandra performance problems when CQL3 is really used
Hi, thanks for the answer and sorry for the delay. Let me answer inline. On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton aa...@thelastpickle.comwrote: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. Yes, this is why you have to add allow_filtering. You are asking the nodes to read all the data that matches and filter in memory, that’s a SQL type operation. Your example query is somewhat complex and I doubt it could get decent performance, what does the query plan look like? I don't know. How do I find out? The only mention about query plan in Cassandra I found is your article on your site, from 2011 and considering version 0.8. The example query gets computed in a fraction of the time if I perform just the fetch of all rows matching the token function and perform the filtering client side. IMHO you need to do further de-normalisation, you will get the best performance when you select rows by their full or part primary key. I denormalize all the way I can. The problem is I need to support paging and filtering at the same time. The API I must support allows filtering by example and paging - so how should I denormalize? Should I somehow manage pages of primary row keys manually? Or should I have manual secondary index and page somehow in the denormalized wide row? The trouble goes even further, even this doesn't perform well: select id from table where token(id) token(some_value) and pk_cluster = 'val' limit N; where id and pk_cluster are primary key (CQL3 table). I guess this should be ordered row query and ordered column slice query, so where is the problem with performance? By the way, the performance is order of magnitude better if this patch is applied: That looks like it’s tuned to your specific need, it would ignore the max results included in the query It is tuned, it only demonstrates the heuristics doesn't work well. * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. Select all rows from a table requires a range scan, which reads all rows from all nodes. It should never be used production. The trouble is I just need to perform it, sometimes. I know what the problem with the query is, but I have just a couple of thousands records - 150.000 - the datasets can all be stored in memory, SSTables can be fully mmapped. There is no reason for this query to be slow in this case. Not sure what you mean by “scans all columns from all rows” a select by column name will use a SliceByNamesReadCommand which will only read the required columns from each SSTable (it normally short circuits though and read from less). The query should fetch only IDs, it checks TTLs of columns though. That is the point. Why does it do it? if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if there is no TTL it will no be checked. It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. I didn't do anything with TTL on the table and it's columns. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. This is not true, could you provide an example where you think this is happening ? We saw it in the trace log. It happened in the select ID from table query. The table had a collection column. Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major compaction. What was the HW platform and what was the load ? My I7/8GB notebook, single node cluster, and virtualised AWS like environment, on nodes of various sizes. Typically freezes in the server correlate to JVM GC, the JVM GC can also be using the CPU. If you have wide rows or make large reads you may run into more JVM GC issues. nodetool flush will (as it says) flush all the tables to disk, if you have a lot tables and/or a lot of secondary indexes this can cause the switch lock to be held preventing write threads from progressing. Once flush threads stop waiting on the flush queue the lock will be released. See the help for memtable_flush_queue_size in the yaml file. I will check this, thanks. major compaction is not recommended to be used in production. If you are seeing it cause
Re: cassandra performance problems
Thanks all for your responses. We've downgraded from 2.0.3 to 2.0.0 and everything became normal. 2013/12/8 Nate McCall n...@thelastpickle.com If you are really set on using Cassandra as a cache, I would recommend disabling durable writes for the keyspace(s)[0]. This will bypass the commitlog (the flushing/rotation of which my be a good-sized portion of your performance problems given the number of tables). [0] http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/alter_keyspace_r.html On Fri, Dec 6, 2013 at 12:42 PM, J. Ryan Earl o...@jryanearl.us wrote: On Thu, Dec 5, 2013 at 6:33 AM, Alexander Shutyaev shuty...@gmail.comwrote: We've plugged it into our production environment as a cache in front of postgres. Everything worked fine, we even stressed it by explicitly propagating about 30G (10G/node) data from postgres to cassandra. If you just want a caching layer, why wouldn't you use Memcached or Redis instead? Cassandra is designed to be a persist store and not so much designed as caching layer. If you were replacing your use of Postgres completely, that would be appropriate. -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
various Cassandra performance problems when CQL3 is really used
Hi all, we are reimplementing a legacy interface of an inventory-like service (currently built on top of mysql) on Cassandra and I thought I would share some findings with the list. The interface semantics is given and cannot be changed. We chose Cassandra due to its multiple datacenter capabilities and no-spof qualities. The dataset is small (6 tables having 150.000 records, a bunch of tables with up to thousands of records), but the model is not trivial - the mysql model has some 20+ tables, joins are frequent, m:n relationships are frequent and the like. The interface is read heavy. We thought the size of the dataset should allow the whole dataset to fit into memory of each node (3 node cluster in each DC, 3 replicas, local quorum operations) and that even though some operations (like secondary index lookup) are not superfast, due to the size of the dataset it should perform ok. We were wrong. We use CQL3 exclusively and we use all of its capabilities (collections, secondary indexes, filtering), because they make the data model maintainable. We denormalised what had to be denormalised in order to avoid client side joins. Usual query to the storage means one CQL query on a denormalised table. We need to support integer offset/limit paging, filter-by-example kind of queries, M:N relationship queries and all the usual suspects of old SQL-backed interface. This is the list of operations that perform really poorly we identified so far. Row id is called id in the following: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. By the way, the performance is order of magnitude better if this patch is applied: diff --git a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java index 5ab1df6..13af671 100644 --- a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java +++ b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java @@ -190,7 +190,8 @@ public class CompositesSearcher extends SecondaryIndexSearcher private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(), 1); // We shouldn't fetch only 1 row as this provides buggy paging in case the first row doesn't satisfy all clauses -private final int rowsPerQuery = Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); +//private final int rowsPerQuery = Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); +private final int rowsPerQuery = 10; public boolean needsFiltering() { * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. * CQL collections See the point above with mixing wide rows and narrow rows. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major compaction. Conclusions: - do not use collections - do not use secondary indexes - do not use filtering - have your rows as narrow as possible if you need any kind of all row keys traversal With these conclusions in mind, CQL seems redundant, plain old thrift may be used, joins should be done client side and/or all indexes need to be handled manually. Correct? Thanks for reading, ondrej cernos
Re: various Cassandra performance problems when CQL3 is really used
* select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. Yes, this is why you have to add allow_filtering. You are asking the nodes to read all the data that matches and filter in memory, that’s a SQL type operation. Your example query is somewhat complex and I doubt it could get decent performance, what does the query plan look like? IMHO you need to do further de-normalisation, you will get the best performance when you select rows by their full or part primary key. By the way, the performance is order of magnitude better if this patch is applied: That looks like it’s tuned to your specific need, it would ignore the max results included in the query. * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. Select all rows from a table requires a range scan, which reads all rows from all nodes. It should never be used production. Not sure what you mean by “scans all columns from all rows” a select by column name will use a SliceByNamesReadCommand which will only read the required columns from each SSTable (it normally short circuits though and read from less). if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if there is no TTL it will no be checked. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. This is not true, could you provide an example where you think this is happening ? Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major compaction. What was the HW platform and what was the load ? Typically freezes in the server correlate to JVM GC, the JVM GC can also be using the CPU. If you have wide rows or make large reads you may run into more JVM GC issues. nodetool flush will (as it says) flush all the tables to disk, if you have a lot tables and/or a lot of secondary indexes this can cause the switch lock to be held preventing write threads from progressing. Once flush threads stop waiting on the flush queue the lock will be released. See the help for memtable_flush_queue_size in the yaml file. major compaction is not recommended to be used in production. If you are seeing it cause performance problems I would guess it is related to JVM GC and/or the disk IO is not able to keep up. When used it creates a single SSTable for each table which will not be compacted again until (default) 3 other large SSTables are created or you run major compaction again. For this reason it is not recommended. Conclusions: - do not use collections - do not use secondary indexes - do not use filtering - have your rows as narrow as possible if you need any kind of all row keys traversal These features all have a use, but it looks like you leaned on them heavily while creating a relational model. Specially the filtering, you have to explicitly enable it to prevent the client sending queries that will take a long time. The only time row key traversal is used normally is reading data through hadoop. You should always strive to read row(s) from a table by the full or partial primary key. With these conclusions in mind, CQL seems redundant, plain old thrift may be used, joins should be done client side and/or all indexes need to be handled manually. Correct? No. CQL provide a set of functionality not present in the thrift API. Joins and indexes should generally be handled by denormlaising the data during writes. It sounds like your data model was too relational, you need to denormalise and read rows by primary key. Secondary indexes are useful when you have a query pattern that is used infrequently. Hope that helps. - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 18/12/2013, at 3:47 am, Ondřej Černoš cern...@gmail.com wrote: Hi all, we are reimplementing a legacy interface of an inventory-like service (currently built on top of mysql) on Cassandra and I thought I would share some findings with the list. The interface semantics is given and cannot be changed. We chose Cassandra due to its multiple datacenter capabilities and
Re: cassandra performance problems
If you are really set on using Cassandra as a cache, I would recommend disabling durable writes for the keyspace(s)[0]. This will bypass the commitlog (the flushing/rotation of which my be a good-sized portion of your performance problems given the number of tables). [0] http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/alter_keyspace_r.html On Fri, Dec 6, 2013 at 12:42 PM, J. Ryan Earl o...@jryanearl.us wrote: On Thu, Dec 5, 2013 at 6:33 AM, Alexander Shutyaev shuty...@gmail.comwrote: We've plugged it into our production environment as a cache in front of postgres. Everything worked fine, we even stressed it by explicitly propagating about 30G (10G/node) data from postgres to cassandra. If you just want a caching layer, why wouldn't you use Memcached or Redis instead? Cassandra is designed to be a persist store and not so much designed as caching layer. If you were replacing your use of Postgres completely, that would be appropriate. -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: cassandra performance problems
On Thu, Dec 5, 2013 at 6:33 AM, Alexander Shutyaev shuty...@gmail.comwrote: We've plugged it into our production environment as a cache in front of postgres. Everything worked fine, we even stressed it by explicitly propagating about 30G (10G/node) data from postgres to cassandra. If you just want a caching layer, why wouldn't you use Memcached or Redis instead? Cassandra is designed to be a persist store and not so much designed as caching layer. If you were replacing your use of Postgres completely, that would be appropriate.
cassandra performance problems
Hi all, We have a 3 node cluster setup, single keyspace, about 500 tables. The hardware is 2 cores + 16 GB RAM (Cassandra chose to have 4GB). Cassandra version is 2.0.3. Our replication factor is 3, read/write consistency is QUORUM. We've plugged it into our production environment as a cache in front of postgres. Everything worked fine, we even stressed it by explicitly propagating about 30G (10G/node) data from postgres to cassandra. Then the problems came. Our nodes began showing high cpu usage (around 20). The funny thing is that they were actually doing it one after another and there was always only node with high cpu usage. Using OpsCenter we saw that when the CPU was beginning to go high the node in question was performing compaction. But even after the compaction was performed the cpu remained still high, and in some cases didn't go down for hours. Our jmx monitoring showed that it was presumably in constant garbage collection. During that time cluster read latency goes from 2ms to 200ms What can be the reason? Can it be high number of tables? Do we need to adjust some settings for this setup? Is it ok to have so many tables? Theoretically we can stuck them all in 3-4 tables. Thanks in advance, Alexander
Re: cassandra performance problems
Do you mean high CPU usage or high load avg? (20 indicates load avg to me). High load avg means the CPU is waiting on something. Check iostat -dmx 1 100 to check your disk stats, you'll see the columns that indicate mb/s read write as well as % utilization. Once you understand the bottleneck we can start to narrow down the cause. On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev shuty...@gmail.comwrote: Hi all, We have a 3 node cluster setup, single keyspace, about 500 tables. The hardware is 2 cores + 16 GB RAM (Cassandra chose to have 4GB). Cassandra version is 2.0.3. Our replication factor is 3, read/write consistency is QUORUM. We've plugged it into our production environment as a cache in front of postgres. Everything worked fine, we even stressed it by explicitly propagating about 30G (10G/node) data from postgres to cassandra. Then the problems came. Our nodes began showing high cpu usage (around 20). The funny thing is that they were actually doing it one after another and there was always only node with high cpu usage. Using OpsCenter we saw that when the CPU was beginning to go high the node in question was performing compaction. But even after the compaction was performed the cpu remained still high, and in some cases didn't go down for hours. Our jmx monitoring showed that it was presumably in constant garbage collection. During that time cluster read latency goes from 2ms to 200ms What can be the reason? Can it be high number of tables? Do we need to adjust some settings for this setup? Is it ok to have so many tables? Theoretically we can stuck them all in 3-4 tables. Thanks in advance, Alexander -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: cassandra performance problems
On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev shuty...@gmail.comwrote: Cassandra version is 2.0.3. ... We've plugged it into our production environment as a cache in front of postgres. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ What can be the reason? Can it be high number of tables? Do we need to adjust some settings for this setup? Is it ok to have so many tables? Theoretically we can stuck them all in 3-4 tables. 500 column families (tables) is a lot but not necessarily too many. During that time cluster read latency goes from 2ms to 200ms In perfectly normal operation, things operating in the JVM may exhibit a few hundreds of milliseconds of latency, when the JVM pauses for garbage collection. If you absolutely require consistent latency below this time for all requests, something running in the JVM may not be for you. If compaction triggers runaway GC then you probably have too much steady state heap consumption. You don't mention anything about the hardware environment, how much heap is available, etc.. ? =Rob
Re: cassandra performance problems
Thanks for your answers, Jonathan, yes it was load avg and iowait was lower than 2% all that time - the only load was the user one. Robert, we had -Xmx4012m which was automatically calculated by the default cassandra-env.sh (1/4 of total memory - 16G) - we didn't change that. 2013/12/5 Robert Coli rc...@eventbrite.com On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev shuty...@gmail.comwrote: Cassandra version is 2.0.3. ... We've plugged it into our production environment as a cache in front of postgres. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ What can be the reason? Can it be high number of tables? Do we need to adjust some settings for this setup? Is it ok to have so many tables? Theoretically we can stuck them all in 3-4 tables. 500 column families (tables) is a lot but not necessarily too many. During that time cluster read latency goes from 2ms to 200ms In perfectly normal operation, things operating in the JVM may exhibit a few hundreds of milliseconds of latency, when the JVM pauses for garbage collection. If you absolutely require consistent latency below this time for all requests, something running in the JVM may not be for you. If compaction triggers runaway GC then you probably have too much steady state heap consumption. You don't mention anything about the hardware environment, how much heap is available, etc.. ? =Rob