Issue with leveled compaction and data migration
Hello, We've been undergoing a migration on Cassandra 1.1.9 where we are combining two column families. We are incrementally moving data from one column family into another, where the columns in a row in the source column family are being appended to columns in a row in the target column family. Both column families are using leveled compaction, and both column families have over 100 million rows. However, our bloom filters on the target column family grow dramatically (less than double) after converting less than 1/4 of the data. I assume this is because new changes are not being compacted with older changes, although I thought leveled compaction would mitigate this for me. Any advice on what we can do to control our bloom filter growth during this migration? Appreciate the help, Thanks, -Mike
Re: heavy insert load overloads CPUs, with MutationStage pending
Paul- Sorry to go off-list but I'm diving pretty far into details here. Ignore if you wish. Thanks a lot for the example, definitely very helpful. I'm surprised that the Cassandra experts aren't more interested-in/alarmed-by our results, it seems like we've proved that insert performance for wide rows in CQL is enormously worse than it was before CQL. And I have a feeling 2.0 won't help much -- I'm already using entirely-prepared batches. To reproduce your example, I switched to cassandra 1.2.6 and astyanax 1.56.42. But anything I try to do with that version combination gives me an exception on the client side (e.g. execute() on a query): 13-09-13 15:42:42.511 [pool-6-thread-1] ERROR c.n.a.t.ThriftSyncConnectionFactoryImpl - Error creating connection java.lang.NoSuchMethodError: org.apache.cassandra.thrift.TBinaryProtocol: method init(Lorg/apache/thrift/transport/TTransport;)V not found at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.open(ThriftSyncConnectionFactoryImpl.java:195) ~[astyanax-thrift-1.56.37.jar:na] at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection$1.run(ThriftSyncConnectionFactoryImpl.java:232) [astyanax-thrift-1.56.37.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_07] From my googling this is due to a cassandra API change in TBinaryProtocol, which is why I had to use cassandra 1.2.5 jars to get my astyanax client to work at all in my earlier experiments. Did you encounter this? Also, you had 1.2.8 in the stackoverflow post, but 1.2.6 in this email, did you have to rollback? Thanks for any help you can offer, hope I can return the favor at some point. On 09/12/2013 02:26 PM, Paul Cichonski wrote: I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is making your Astyanax (I'm running 1.56.42) mutation work with the CQL table definition (this is definitely a bit of a hack since most of the advice says don't mix the CQL and Thrift APIs so it is your call on how far you want to go). If you want to still try and test it out you need to leverage the Astyanax CompositeColumn construct to make it work (https://github.com/Netflix/astyanax/wiki/Composite-columns) I've provided a slightly modified version of what I am doing below: CQL table def: CREATE TABLE standard_subscription_index ( subscription_type text, subscription_target_id text, entitytype text, entityid int, creationtimestamp timestamp, indexed_tenant_id uuid, deleted boolean, PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, entityid) ) ColumnFamily definition: private static final ColumnFamilySubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn COMPOSITE_ROW_COLUMN = new ColumnFamilySubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn( SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializerSubscriptionIndexCompositeKey(SubscriptionIndexCompositeKey.class), new AnnotatedCompositeSerializerSubscribingEntityCompositeColumn(SubscribingEntityCompositeColumn.class)); SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g., subscription_type, subscription_target_id), and SubscribingEntityCompositeColumn contains the fields from the composite column (as it would look if you view your data using Cassandra-cli), so: entityType, entityId, columnName. The columnName field is the tricky part as it defines what to interpret the column value as (i.e., if it is a value for the creationtimestamp the column might be someEntityType:4:creationtimestamp The actual mutation looks something like this: final MutationBatch mutation = getKeyspace().prepareMutationBatch(); final ColumnListMutationSubscribingEntityCompositeColumn row = mutation.withRow(COMPOSITE_ROW_COLUMN, new SubscriptionIndexCompositeKey(targetEntityType.getName(), targetEntityId)); for (Subscription sub : subs) { row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), creationtimestamp), sub.getCreationTimestamp()); row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), deleted), sub.isDeleted()); row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), indexed_tenant_id), tenantId); } Hope that helps, Paul
Re: heavy insert load overloads CPUs, with MutationStage pending
https://github.com/Netflix/astyanax/issues/391 I've gotten in touch with a couple of netflix folks and they are going to try to roll a release shortly. You should be able to build against 1.2.2 and 'talking' to 1.2.9 instance should work. Just a PITA development wise to maintain a different version(s). On Fri, Sep 13, 2013 at 10:52 AM, Keith Freeman 8fo...@gmail.com wrote: Paul- Sorry to go off-list but I'm diving pretty far into details here. Ignore if you wish. Thanks a lot for the example, definitely very helpful. I'm surprised that the Cassandra experts aren't more interested-in/alarmed-by our results, it seems like we've proved that insert performance for wide rows in CQL is enormously worse than it was before CQL. And I have a feeling 2.0 won't help much -- I'm already using entirely-prepared batches. To reproduce your example, I switched to cassandra 1.2.6 and astyanax 1.56.42. But anything I try to do with that version combination gives me an exception on the client side (e.g. execute() on a query): 13-09-13 15:42:42.511 [pool-6-thread-1] ERROR c.n.a.t.** ThriftSyncConnectionFactoryImp**l - Error creating connection java.lang.NoSuchMethodError: org.apache.cassandra.thrift.**TBinaryProtocol: method init(Lorg/apache/thrift/**transport/TTransport;)V not found at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp** l$ThriftConnection.open(**ThriftSyncConnectionFactoryImp**l.java:195) ~[astyanax-thrift-1.56.37.jar:**na] at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp** l$ThriftConnection$1.run(**ThriftSyncConnectionFactoryImp**l.java:232) [astyanax-thrift-1.56.37.jar:**na] at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:471) [na:1.7.0_07] From my googling this is due to a cassandra API change in TBinaryProtocol, which is why I had to use cassandra 1.2.5 jars to get my astyanax client to work at all in my earlier experiments. Did you encounter this? Also, you had 1.2.8 in the stackoverflow post, but 1.2.6 in this email, did you have to rollback? Thanks for any help you can offer, hope I can return the favor at some point. On 09/12/2013 02:26 PM, Paul Cichonski wrote: I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is making your Astyanax (I'm running 1.56.42) mutation work with the CQL table definition (this is definitely a bit of a hack since most of the advice says don't mix the CQL and Thrift APIs so it is your call on how far you want to go). If you want to still try and test it out you need to leverage the Astyanax CompositeColumn construct to make it work ( https://github.com/Netflix/**astyanax/wiki/Composite-**columnshttps://github.com/Netflix/astyanax/wiki/Composite-columns ) I've provided a slightly modified version of what I am doing below: CQL table def: CREATE TABLE standard_subscription_index ( subscription_type text, subscription_target_id text, entitytype text, entityid int, creationtimestamp timestamp, indexed_tenant_id uuid, deleted boolean, PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, entityid) ) ColumnFamily definition: private static final ColumnFamily**SubscriptionIndexCompositeKey, SubscribingEntityCompositeColu**mn COMPOSITE_ROW_COLUMN = new ColumnFamily**SubscriptionIndexCompositeKey, SubscribingEntityCompositeColu**mn( SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer** SubscriptionIndexCompositeKey**(**SubscriptionIndexCompositeKey.** class), new AnnotatedCompositeSerializer**SubscribingEntityCompositeColu **mn(**SubscribingEntityCompositeColu**mn.class)); SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g., subscription_type, subscription_target_id), and SubscribingEntityCompositeColu**mn contains the fields from the composite column (as it would look if you view your data using Cassandra-cli), so: entityType, entityId, columnName. The columnName field is the tricky part as it defines what to interpret the column value as (i.e., if it is a value for the creationtimestamp the column might be someEntityType:4:**creationtimestamp The actual mutation looks something like this: final MutationBatch mutation = getKeyspace().**prepareMutationBatch(); final ColumnListMutation**SubscribingEntityCompositeColu**mn row = mutation.withRow(COMPOSITE_**ROW_COLUMN, new SubscriptionIndexCompositeKey(**targetEntityType.getName(), targetEntityId)); for (Subscription sub : subs) { row.putColumn(new SubscribingEntityCompositeColu** mn(sub.getEntityType().**getName(), sub.getEntityId(), creationtimestamp), sub.getCreationTimestamp()); row.putColumn(new SubscribingEntityCompositeColu** mn(sub.getEntityType().**getName(), sub.getEntityId(), deleted),
Re: Normal OS: Disk Throughput levels for EC2
My apologies, information that should have been in my original email. m1.xlarges using a single raid0 ephemeral array for both data and the commit log. Latest burst write was ~150GB over 3 nodes ( rf 3 so 150GB per node ) with 8GB heap but no major spikes show up on the Opscenter graph for throughput writes. On Fri, Sep 13, 2013 at 7:45 AM, Nate McCall n...@thelastpickle.com wrote: This can vary pretty heavily by instance type and storage options. What size instances are these and how is the storage configured? On Fri, Sep 13, 2013 at 1:11 AM, David Ward da...@shareablee.com wrote: I noticed on EC2, the c* nodes according to OpsCenter have never gone above 1.6-2.2MBps. That seems abnormally low but I have no reference as to what is normal for cassandra on EC2 and curious what other people have seen according to OpsCenter for the OS: Disk Throughput metric. Thanks, Dave
Re: heavy insert load overloads CPUs, with MutationStage pending
Also, I was working on this a bit for a client so compiled my notes and approach into a blog post for posterity (and so it's easier to find for others): http://thelastpickle.com/blog/2013/09/13/CQL3-to-Astyanax-Compatibility.html Paul's method on this thread is cited at the bottom as well. On Fri, Sep 13, 2013 at 11:16 AM, Nate McCall n...@thelastpickle.comwrote: https://github.com/Netflix/astyanax/issues/391 I've gotten in touch with a couple of netflix folks and they are going to try to roll a release shortly. You should be able to build against 1.2.2 and 'talking' to 1.2.9 instance should work. Just a PITA development wise to maintain a different version(s). On Fri, Sep 13, 2013 at 10:52 AM, Keith Freeman 8fo...@gmail.com wrote: Paul- Sorry to go off-list but I'm diving pretty far into details here. Ignore if you wish. Thanks a lot for the example, definitely very helpful. I'm surprised that the Cassandra experts aren't more interested-in/alarmed-by our results, it seems like we've proved that insert performance for wide rows in CQL is enormously worse than it was before CQL. And I have a feeling 2.0 won't help much -- I'm already using entirely-prepared batches. To reproduce your example, I switched to cassandra 1.2.6 and astyanax 1.56.42. But anything I try to do with that version combination gives me an exception on the client side (e.g. execute() on a query): 13-09-13 15:42:42.511 [pool-6-thread-1] ERROR c.n.a.t.** ThriftSyncConnectionFactoryImp**l - Error creating connection java.lang.NoSuchMethodError: org.apache.cassandra.thrift.**TBinaryProtocol: method init(Lorg/apache/thrift/**transport/TTransport;)V not found at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp** l$ThriftConnection.open(**ThriftSyncConnectionFactoryImp**l.java:195) ~[astyanax-thrift-1.56.37.jar:**na] at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp** l$ThriftConnection$1.run(**ThriftSyncConnectionFactoryImp**l.java:232) [astyanax-thrift-1.56.37.jar:**na] at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:471) [na:1.7.0_07] From my googling this is due to a cassandra API change in TBinaryProtocol, which is why I had to use cassandra 1.2.5 jars to get my astyanax client to work at all in my earlier experiments. Did you encounter this? Also, you had 1.2.8 in the stackoverflow post, but 1.2.6 in this email, did you have to rollback? Thanks for any help you can offer, hope I can return the favor at some point. On 09/12/2013 02:26 PM, Paul Cichonski wrote: I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is making your Astyanax (I'm running 1.56.42) mutation work with the CQL table definition (this is definitely a bit of a hack since most of the advice says don't mix the CQL and Thrift APIs so it is your call on how far you want to go). If you want to still try and test it out you need to leverage the Astyanax CompositeColumn construct to make it work ( https://github.com/Netflix/**astyanax/wiki/Composite-**columnshttps://github.com/Netflix/astyanax/wiki/Composite-columns ) I've provided a slightly modified version of what I am doing below: CQL table def: CREATE TABLE standard_subscription_index ( subscription_type text, subscription_target_id text, entitytype text, entityid int, creationtimestamp timestamp, indexed_tenant_id uuid, deleted boolean, PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, entityid) ) ColumnFamily definition: private static final ColumnFamily**SubscriptionIndexCompositeKey, SubscribingEntityCompositeColu**mn COMPOSITE_ROW_COLUMN = new ColumnFamily**SubscriptionIndexCompositeKey, SubscribingEntityCompositeColu**mn( SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer** SubscriptionIndexCompositeKey**(**SubscriptionIndexCompositeKey.** class), new AnnotatedCompositeSerializer** SubscribingEntityCompositeColu**mn(**SubscribingEntityCompositeColu** mn.class)); SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g., subscription_type, subscription_target_id), and SubscribingEntityCompositeColu**mn contains the fields from the composite column (as it would look if you view your data using Cassandra-cli), so: entityType, entityId, columnName. The columnName field is the tricky part as it defines what to interpret the column value as (i.e., if it is a value for the creationtimestamp the column might be someEntityType:4:**creationtimestamp The actual mutation looks something like this: final MutationBatch mutation = getKeyspace().**prepareMutationBatch(); final ColumnListMutation**SubscribingEntityCompositeColu**mn row = mutation.withRow(COMPOSITE_**ROW_COLUMN, new SubscriptionIndexCompositeKey(**targetEntityType.getName(), targetEntityId)); for (Subscription
is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce. The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table. Thanks, Dean
Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
It sounds some something that's only useful in a really limited use case. In an 11 node cluster it would be quorum reads / writes would need to come from 6 nodes. It would probably be much slower for both reads writes. It sounds like what you want is a database with replication, not partitioning. On Sep 13, 2013, at 11:15 AM, Hiller, Dean dean.hil...@nrel.gov wrote: When I add nodes though, I would kind of be screwed there, right? Is there an RF=${nodecount}…that would be neat. Dean From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, September 13, 2013 12:06 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)? On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce. The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table. Create a keyspace with replication configured such that RF=N? =Rob
Re: Normal OS: Disk Throughput levels for EC2
This can vary pretty heavily by instance type and storage options. What size instances are these and how is the storage configured? On Fri, Sep 13, 2013 at 1:11 AM, David Ward da...@shareablee.com wrote: I noticed on EC2, the c* nodes according to OpsCenter have never gone above 1.6-2.2MBps. That seems abnormally low but I have no reference as to what is normal for cassandra on EC2 and curious what other people have seen according to OpsCenter for the OS: Disk Throughput metric. Thanks, Dave
Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce. The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table. Create a keyspace with replication configured such that RF=N? =Rob
Re: Normal OS: Disk Throughput levels for EC2
You should give further information if you want an answer. What kind of instance it is ? Instance sore / EBS / Optimized EBS ? Do you try to read / write on this disk ? How much ? ... With m1 xlarge we reached 40 MBps and now with a hi1.4xlarge we don't have reach any limit yet, and we have 100+ MBps. My guess is that either you are writing (so no disk ops, because of memtables) or your client process data too slowly to use correctly your hardware. Alain 2013/9/13 David Ward da...@shareablee.com I noticed on EC2, the c* nodes according to OpsCenter have never gone above 1.6-2.2MBps. That seems abnormally low but I have no reference as to what is normal for cassandra on EC2 and curious what other people have seen according to OpsCenter for the OS: Disk Throughput metric. Thanks, Dave
Nodes separating from the ring
Hi, all - We've been running Cassandra 1.1.12 in production since February, and have experienced a vexing problem with an arbitrary node falling out of or separating from the ring on occasion. When a node falls out of the ring, running nodetool ring on the misbehaving node shows that the misbehaving node believes that is Up, but that the rest of the ring is Down, and the rest of the ring has question marks listed for load. nodetool ring on any of the other nodes, however, shows the misbehaving node as Down but everything else is up. Shutting down and restarting the misbehaving node does not result in changed behavior. We can only get the misbehaving node to rejoin the ring by shutting it down, running nodetool removetoken misbehaving node token and nodetool removetoken force elsewhere in the ring. After the node's token has been removed from the ring, it will rejoin and behave normally when it is restarted. This is not a frequent occurrence - we can go months between this happening. It most commonly occurs when a different node is brought down and then back up, but it can happen spontaneously. This is also not associated with a network connectivity event; we've seen no interruption in the nodes being able to communicate over the network. As above, it's also not isolated to a single node; we've seen this behavior on multiple nodes. This has occurred with both the identical seeds specified in cassandra.yaml on each node, and also when we remove the node from its own seed list (so any seed won't try to auto-bootstrap from itself). Seeds have always been up and available. Has anyone else seen similar behavior? For obvious reasons, we hate seeing one of the nodes suddenly fall out and require intervention when we flap another node, or for no reason at all. Thanks, Dave
Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
That's an interesting idea…..so that would be an RF=1 in each data center…..very interesting. Dean From: Jonathan Haddad j...@jonhaddad.commailto:j...@jonhaddad.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, September 13, 2013 1:50 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)? You could create a bunch of 1 node DCs if you really wanted it. On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Actually, I have been on a few projects where something like that is useful. Gemfire(a grid memory cache) had that feature which we used at another company. On every project I encounter, there is usually one small table somewhereŠ.either meta data or something that is infrequently changing and nice to duplicate on every node. I bet eventually nosql stores may start to add it maybe in a few years, but I guess we are not there yet. Thanks, Dean On 9/13/13 12:24 PM, Jon Haddad j...@jonhaddad.commailto:j...@jonhaddad.com wrote: It sounds some something that's only useful in a really limited use case. In an 11 node cluster it would be quorum reads / writes would need to come from 6 nodes. It would probably be much slower for both reads writes. It sounds like what you want is a database with replication, not partitioning. On Sep 13, 2013, at 11:15 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: When I add nodes though, I would kind of be screwed there, right? Is there an RF=${nodecount}Šthat would be neat. Dean From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.commailto:rc...@eventbrite.commailto:rc...@eventbrite.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, September 13, 2013 12:06 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)? On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce. The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table. Create a keyspace with replication configured such that RF=N? =Rob -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
On Fri, Sep 13, 2013 at 11:15 AM, Hiller, Dean dean.hil...@nrel.gov wrote: When I add nodes though, I would kind of be screwed there, right? Is there an RF=${nodecount}…that would be neat. Increasing replication factor is well understood, and in this case you could pre-load the entire dataset onto the new node instead of having it bootstrap. But the DC-per-node idea is.. kinda interesting.. =Rob