Re: Decommission an entire DC
And if we want to add a new DC ? I suppose we should add all nodes and alter the replication factor of the keyspace after that, but if anyone can confirm it and maybe give me some tips ? FYI ,we have 2 DCs with between 10 and 20 nodes in each and a 2To database (local replication factor included) thanks -- Cyril SCETBON On Jul 24, 2013, at 12:04 AM, Omar Shibli o...@eyeviewdigital.com wrote: All you need to do is to decrease the replication factor of DC1 to 0, and then decommission the nodes one by one, I've tried this before and it worked with no issues. Thanks, On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple la...@spotright.com wrote: Hi, We have a multi-dc setup using DC1:2, DC2:2. We want to get rid of DC1. We're in the position where we don't need to save any of the data on DC1. We know we'll lose a (tiny. already checked) bit of data but our processing is such that we'll recover over time. How do we drop DC1 and just move forward with DC2? Using nodetool decommision or removetoken looks like we'll eventually end up with a single DC1 node containing the entire dc's data which would be slow and costly. We've speculated that setting DC1:0 or removing it from the schema would do the trick but without finding any hits during searching on that idea I hesitate to just do it. We can drop DC1s data but have to keep a working ring in DC2.
RE: disappointed
Hi Paul, Sorry to hear you're having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I've had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We've just spent 3-5 months moving from Mongo to Cassandra. It's been expensive and painful getting Cassandra to read like Mongo, but we've made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1. I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1.. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul
Re: disappointed
Hi Paul, Concerning large rows which are not compacting, I've probably managed to reproduce your problem. I suppose you're using collections, but also TTLs ? Anyway, I opened an issue here : https://issues.apache.org/jira/browse/CASSANDRA-5799 Hope this helps 2013/7/24 Christopher Wirt chris.w...@struq.com Hi Paul, ** ** Sorry to hear you’re having a low point. ** ** We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. ** ** We only have fixed schema short rows so no experience with large row compaction. ** ** File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? ** ** Only time I’ve had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. ** ** ** ** From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We’ve just spent 3-5 months moving from Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read like Mongo, but we’ve made it J ** ** ** ** ** ** ** ** *From:* Paul Ingalls [mailto:paulinga...@gmail.com] *Sent:* 24 July 2013 06:01 *To:* user@cassandra.apache.org *Subject:* disappointed ** ** I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. ** ** First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. ** ** But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1… ** ** I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1…. ** ** Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? ** ** Paul -- Fabien Rousseau * * aur...@yakaz.comwww.yakaz.com
Re: disappointed
From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. Its not dangerous, just do not try to be smart and follow what other big cassandra users like twitter, netflix, facebook, etc are using. If they are still at 1.1, then do not rush to 1.2. You can get all informations you need from github and their maven repos. Same method can be used for any other not mainstream software like scala and hadoop. Also every cassandra new branch comes with extensive number of difficult to spot bugs and it takes about 1/2 year to stabilize. Usually new features should be avoided. Best is to stay 1 major version behind. This is true for almost any mission critical software. You can help with testing cassandra 2.0 beta. Create your testsuite and run it against your target cassandra version. Test suite also needs to track performance. From my testing performance of 2.0 is about same as 1.2 in my workload. I had lot of problems after i migrated from really good working 0.8.x to 1.0.5. Even if preproduction testing did not discovered any problems, there were memory leaks in 1.0.5, hint delivery was broken and there were problem with repair making old tombstones appear causing snowball effect. Last one was fixed about 1year later in mainstream C* after i fixed it myself because no dev believed me that such thing can happen.
MapReduce response time and speed
Hi, I am Jan Algermissen (REST-head, freelance programmer/consultant) and Cassandra-newbie. I am looking at Cassandra for an application I am working on. There will be a max. of 10 Million items (Texts and attributes of a retailer's products) in the database. There will occasional writes (e.g. price updates). The use case for the application is to work on the whole data set, item by item to produce 'exports'. It will be neccessary to access the full set every time. There is no relationship between the items. Processing is done iteratively. My question: I am thinking that this is an ideal scenario for map-reduce but I am unsure about two things: Can a user of the system define new jobs in an ad-hoc fashion (like a query) or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a developer to compile-in the job when you need the perormance of Erlang-based jobs). Suppose a user indeed can specify a job and send it off to Cassandra for processing, what is the expected response time? Is it possible to reduce the response time (by tuning, adding more nodes) to make a result available within a couple of minutes? Or will there most certainly be a gap of 10 minutes or so and more? I understand that map-reduce is not for ad-hoc 'querying', but my users expect the system to feel quasi-ineractive, because they intend to refine the processing job based on the results they get. A short gap would be ok, but a definite gap in the order of 10+ minutes not. (For example, as far as I learned with RIAK you would most certainly have such a gap. How about Cassandra? Throwing more nodes at the problem would be ok, I just need to understand whether there is a definite 'response time penalty' I have to expect no matter what) Jan
Cassandra and RAIDs
Hi, second question: is it recommended to set up Cassandra using 'RAID-ed' disks for per-node reliability or do people usually just rely on having the multiple nodes anyway - why bother with replicated disks? Jan
Re: Cassandra and RAIDs
From: http://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning * RAID on data disks: It is generally not necessary to use RAID for the following reasons: * Data is replicated across the cluster based on the replication factor you've chosen. * Starting in version 1.2, Cassandra includes takes care of disk management with the JBOD (Just a bunch of disks) support feature. Because Cassandra properly reacts to a disk failure, based on your availability/consistency requirements, either by stopping the affected node or by blacklisting the failed drive, this allows you to deploy Cassandra nodes with large disk arrays without the overhead of RAID 10. * RAID on the commit log disk: Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need the extra redundancy, use RAID 1. Andy On 24 Jul 2013, at 15:36, Jan Algermissen jan.algermis...@nordsc.commailto:jan.algermis...@nordsc.com wrote: Hi, second question: is it recommended to set up Cassandra using 'RAID-ed' disks for per-node reliability or do people usually just rely on having the multiple nodes anyway - why bother with replicated disks? Jan The University of Dundee is a registered Scottish Charity, No: SC015096
Re: Cassandra and RAIDs
On 24 July 2013 15:36, Jan Algermissen jan.algermis...@nordsc.com wrote: is it recommended to set up Cassandra using 'RAID-ed' disks for per-node reliability or do people usually just rely on having the multiple nodes anyway - why bother with replicated disks? It's not necessary, due to replication as you say. You can give Cassandra your JBOD disks and it will split data between them and avoid a disk (or fail the node, you can choose) if one fails. There are some reasons to consider RAID though: * It is probably quicker and places no load on the rest of the cluster to do a RAID rebuild rather than a nodetool rebuild/repaid. The importance of this depends on how much data you have and the load on your cluster. If you don't have much data per node or if there is spare capacity then RAID will offer no benefit here. * Using JBOD, the largest SSTable you can have is limited to the size of one disk. This is unlikely to cause problems in most scenarios but an erroneous nodetool compact could cause problems if your data size is greater than can fit on any one disk. Richard.
Re: MapReduce response time and speed
You have lot of questions there so I can't answer all but for the following: *Can a user of the system define new jobs in an ad-hoc fashion (like a query) or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a developer to compile-in the job when you need the perormance of Erlang-based jobs). Suppose a user indeed can specify a job and send it off to Cassandra for processing, what is the expected response time?* You can use high-level tools like Pig, Hive and Oozie But mind you, it will depend on your data size, complexity of the job, cluster and tune parameters. Regards, Shahab On Wed, Jul 24, 2013 at 10:33 AM, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi, I am Jan Algermissen (REST-head, freelance programmer/consultant) and Cassandra-newbie. I am looking at Cassandra for an application I am working on. There will be a max. of 10 Million items (Texts and attributes of a retailer's products) in the database. There will occasional writes (e.g. price updates). The use case for the application is to work on the whole data set, item by item to produce 'exports'. It will be neccessary to access the full set every time. There is no relationship between the items. Processing is done iteratively. My question: I am thinking that this is an ideal scenario for map-reduce but I am unsure about two things: Can a user of the system define new jobs in an ad-hoc fashion (like a query) or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a developer to compile-in the job when you need the perormance of Erlang-based jobs). Suppose a user indeed can specify a job and send it off to Cassandra for processing, what is the expected response time? Is it possible to reduce the response time (by tuning, adding more nodes) to make a result available within a couple of minutes? Or will there most certainly be a gap of 10 minutes or so and more? I understand that map-reduce is not for ad-hoc 'querying', but my users expect the system to feel quasi-ineractive, because they intend to refine the processing job based on the results they get. A short gap would be ok, but a definite gap in the order of 10+ minutes not. (For example, as far as I learned with RIAK you would most certainly have such a gap. How about Cassandra? Throwing more nodes at the problem would be ok, I just need to understand whether there is a definite 'response time penalty' I have to expect no matter what) Jan
Re: disappointed
Same type of error, but I'm not currently using TTL's. I am, however, generating a lot of tombstones as I add elements to collections…. On Jul 24, 2013, at 6:42 AM, Fabien Rousseau fab...@yakaz.com wrote: Hi Paul, Concerning large rows which are not compacting, I've probably managed to reproduce your problem. I suppose you're using collections, but also TTLs ? Anyway, I opened an issue here : https://issues.apache.org/jira/browse/CASSANDRA-5799 Hope this helps 2013/7/24 Christopher Wirt chris.w...@struq.com Hi Paul, Sorry to hear you’re having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I’ve had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We’ve just spent 3-5 months moving from Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read like Mongo, but we’ve made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1… I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1…. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul -- Fabien Rousseau www.yakaz.com
unsubscribe
http://wiki.apache.org/cassandra/FAQ#unsubscribe unsubscribe
Re: disappointed
Hi Chris, Thanks for the response! What kind of challenges did you run into that kept you from using collections? I currently and running 4 physical nodes, same as I was with case 1.1.6. I'm using size tiered compaction. Would changing to level tiered with a large minimum make a big difference, or would it just push the problem off till later? Yeah, I have run into problems dropping schemas before as well. I was careful this time to start with an empty db folder… Glad you were successful in your transition…:) Paul On Jul 24, 2013, at 4:12 AM, Christopher Wirt chris.w...@struq.com wrote: Hi Paul, Sorry to hear you’re having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I’ve had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We’ve just spent 3-5 months moving from Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read like Mongo, but we’ve made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1… I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1…. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul
Re: disappointed
Hey Radim, I knew that it would take a while to stabilize, which is why I waited 1/2 a year before giving it a go. I guess I was just surprised that 6 months wasn't long enough… I'll have to look at the differences between 1.2 and 2.0. Is there a good resource for checking that? Your experience is less than encouraging…:) I am worried that if I stick with it, I'll have to invest time into learning the code base as well, and as a small startup time is our most valuable resource… Thanks for the thoughts! Paul On Jul 24, 2013, at 6:42 AM, Radim Kolar h...@filez.com wrote: From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. Its not dangerous, just do not try to be smart and follow what other big cassandra users like twitter, netflix, facebook, etc are using. If they are still at 1.1, then do not rush to 1.2. You can get all informations you need from github and their maven repos. Same method can be used for any other not mainstream software like scala and hadoop. Also every cassandra new branch comes with extensive number of difficult to spot bugs and it takes about 1/2 year to stabilize. Usually new features should be avoided. Best is to stay 1 major version behind. This is true for almost any mission critical software. You can help with testing cassandra 2.0 beta. Create your testsuite and run it against your target cassandra version. Test suite also needs to track performance. From my testing performance of 2.0 is about same as 1.2 in my workload. I had lot of problems after i migrated from really good working 0.8.x to 1.0.5. Even if preproduction testing did not discovered any problems, there were memory leaks in 1.0.5, hint delivery was broken and there were problem with repair making old tombstones appear causing snowball effect. Last one was fixed about 1year later in mainstream C* after i fixed it myself because no dev believed me that such thing can happen.
RE: disappointed
We found the performance of collections to not be great and needed a quick solution. We've always used the levelled compaction strategy where you declare a sstable_size_in_mb not min_compaction_threshold. Much better for our use case. http://www.datastax.com/dev/blog/when-to-use-leveled-compaction We are read-heavy latency sensitive people Lots of TTL'ing Few writes compared to reads. From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 17:43 To: user@cassandra.apache.org Subject: Re: disappointed Hi Chris, Thanks for the response! What kind of challenges did you run into that kept you from using collections? I currently and running 4 physical nodes, same as I was with case 1.1.6. I'm using size tiered compaction. Would changing to level tiered with a large minimum make a big difference, or would it just push the problem off till later? Yeah, I have run into problems dropping schemas before as well. I was careful this time to start with an empty db folder. Glad you were successful in your transition.:) Paul On Jul 24, 2013, at 4:12 AM, Christopher Wirt chris.w...@struq.com wrote: Hi Paul, Sorry to hear you're having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I've had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We've just spent 3-5 months moving from Mongo to Cassandra. It's been expensive and painful getting Cassandra to read like Mongo, but we've made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1. I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1.. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul
unsubscribe
unsubscribe
Re: unable to compact large rows
Would it possible to delete this row and reinsert this row? By the way, how large is that one row? Jason On Wed, Jul 24, 2013 at 9:23 AM, Paul Ingalls paulinga...@gmail.com wrote: I'm getting constant exceptions during compaction of large rows. In fact, I have not seen one work, even starting from an empty DB. As soon as I start pushing in data, when a row hits the large threshold, it fails compaction with this type of stack trace: INFO [CompactionExecutor:6] 2013-07-24 01:17:53,592 CompactionController.java (line 156) Compacting large row fanzo/tweets_by_id:352567939972603904 (153360688 bytes) incrementally ERROR [CompactionExecutor:6] 2013-07-24 01:18:12,496 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:6,1,main] java.lang.AssertionError: incorrect row data size 5722610 written to /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_id/fanzo-tweets_by_id-tmp-ic-1453-Data.db; correct is 5767384 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I'm not sure what to do or where to look. Help…:) Thanks, Paul
Re: disappointed
Same thing here... Since #5677 seems to affect a lot of users what do you think about releasing a version 1.2.6.1? I can patch myself, yeah, but do I want to push this into production? Hmm... Am 24.07.2013 18:58, schrieb Paul Ingalls: Hey Radim, I knew that it would take a while to stabilize, which is why I waited 1/2 a year before giving it a go. I guess I was just surprised that 6 months wasn't long enough… I'll have to look at the differences between 1.2 and 2.0. Is there a good resource for checking that? Your experience is less than encouraging…:) I am worried that if I stick with it, I'll have to invest time into learning the code base as well, and as a small startup time is our most valuable resource… Thanks for the thoughts! Paul On Jul 24, 2013, at 6:42 AM, Radim Kolar h...@filez.com mailto:h...@filez.com wrote: From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. Its not dangerous, just do not try to be smart and follow what other big cassandra users like twitter, netflix, facebook, etc are using. If they are still at 1.1, then do not rush to 1.2. You can get all informations you need from github and their maven repos. Same method can be used for any other not mainstream software like scala and hadoop. Also every cassandra new branch comes with extensive number of difficult to spot bugs and it takes about 1/2 year to stabilize. Usually new features should be avoided. Best is to stay 1 major version behind. This is true for almost any mission critical software. You can help with testing cassandra 2.0 beta. Create your testsuite and run it against your target cassandra version. Test suite also needs to track performance. From my testing performance of 2.0 is about same as 1.2 in my workload. I had lot of problems after i migrated from really good working 0.8.x to 1.0.5. Even if preproduction testing did not discovered any problems, there were memory leaks in 1.0.5, hint delivery was broken and there were problem with repair making old tombstones appear causing snowball effect. Last one was fixed about 1year later in mainstream C* after i fixed it myself because no dev believed me that such thing can happen. -- Steffen Rusitschka CTO MegaZebra GmbH Steinsdorfstraße 2 81538 München Phone +49 89 80929577 r...@megazebra.com Challenge me at www.megazebra.com MegaZebra GmbH Geschäftsführer: Henning Kosmack, Christian Meister, Steffen Rusitschka Sitz der Gesellschaft: München, HRB 177947
Re: disappointed
On Wed, Jul 24, 2013 at 11:37 AM, Steffen Rusitschka r...@megazebra.comwrote: Same thing here... Since #5677 seems to affect a lot of users what do you think about releasing a version 1.2.6.1? I can patch myself, yeah, but do I want to push this into production? Hmm... A better solution would likely involve not running cutting edge code in production.. if you find yourself needing to upgrade production anything on the day of a release, you are probably ahead of the version it is reasonable to run in production. If you're already comfortable with this high level of risk in production, I don't really see small manual patches as significantly increasing your level of risk... =Rob
Re: Decommission an entire DC
That one is documented -- http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html On Wed, Jul 24, 2013 at 3:33 AM, Cyril Scetbon cyril.scet...@free.frwrote: And if we want to add a new DC ? I suppose we should add all nodes and alter the replication factor of the keyspace after that, but if anyone can confirm it and maybe give me some tips ? FYI ,we have 2 DCs with between 10 and 20 nodes in each and a 2To database (local replication factor included) thanks -- Cyril SCETBON On Jul 24, 2013, at 12:04 AM, Omar Shibli o...@eyeviewdigital.com wrote: All you need to do is to decrease the replication factor of DC1 to 0, and then decommission the nodes one by one, I've tried this before and it worked with no issues. Thanks, On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple la...@spotright.comwrote: Hi, We have a multi-dc setup using DC1:2, DC2:2. We want to get rid of DC1. We're in the position where we don't need to save any of the data on DC1. We know we'll lose a (tiny. already checked) bit of data but our processing is such that we'll recover over time. How do we drop DC1 and just move forward with DC2? Using nodetool decommision or removetoken looks like we'll eventually end up with a single DC1 node containing the entire dc's data which would be slow and costly. We've speculated that setting DC1:0 or removing it from the schema would do the trick but without finding any hits during searching on that idea I hesitate to just do it. We can drop DC1s data but have to keep a working ring in DC2.
Re: disappointed
cas 2.0b2 https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.0.0-beta2-tentative and as a small startup time is our most valuable resource… use technology you are most familiar with.
Data disappear immediately after reading?
Hi all, I know the subject is not saying much but this is what I'm experiencing now with my cluster. After some years without any problem now I'm experiencing problems with counters but, the most serious problem, is data loss immediately after a read. I have some webservices that I use to query data on Cassandra but in the last month happened 2 times the following problem: I call my WS, it shows data. I refresh the page -- data are no more available! I can call then 200 times the WS but I won't see data anymore ... today my colleague experienced the same problem. The WS are ABSOLUTELY read only on the DB and there are no write to erase these data. Anyone understand wth is going on? I have no idea but most of all I don't know how to fix. Any help would really be appreciated. Kind Regards, Carlo
R: Data disappear immediately after reading?
Sorry I forgot to tell Apache Cassandra 1.07 on Ubuntu 10.04 The data that are disappearing are not Counters but common Rows Messaggio originale Da: cbert...@libero.it Data: 24/07/2013 22.34 A: user@cassandra.apache.org Ogg: Data disappear immediately after reading? Hi all, I know the subject is not saying much but this is what I'm experiencing now with my cluster. After some years without any problem now I'm experiencing problems with counters but, the most serious problem, is data loss immediately after a read. I have some webservices that I use to query data on Cassandra but in the last month happened 2 times the following problem: I call my WS, it shows data. I refresh the page -- data are no more available! I can call then 200 times the WS but I won't see data anymore ... today my colleague experienced the same problem. The WS are ABSOLUTELY read only on the DB and there are no write to erase these data. Anyone understand wth is going on? I have no idea but most of all I don't know how to fix. Any help would really be appreciated. Kind Regards, Carlo
Re: Data disappear immediately after reading?
Carlo, Do you read/write with the consistency levels according to your needs [1]? Have you tried to see if it happens when using the cassandra-cli to get that data? [1] http://wiki.apache.org/cassandra/ArchitectureOverview On Wed, Jul 24, 2013 at 5:34 PM, cbert...@libero.it cbert...@libero.itwrote: Hi all, I know the subject is not saying much but this is what I'm experiencing now with my cluster. After some years without any problem now I'm experiencing problems with counters but, the most serious problem, is data loss immediately after a read. I have some webservices that I use to query data on Cassandra but in the last month happened 2 times the following problem: I call my WS, it shows data. I refresh the page -- data are no more available! I can call then 200 times the WS but I won't see data anymore ... today my colleague experienced the same problem. The WS are ABSOLUTELY read only on the DB and there are no write to erase these data. Anyone understand wth is going on? I have no idea but most of all I don't know how to fix. Any help would really be appreciated. Kind Regards, Carlo
Re: Data disappear immediately after reading?
On Wed, Jul 24, 2013 at 1:34 PM, cbert...@libero.it cbert...@libero.itwrote: After some years without any problem now I'm experiencing problems with [not-actually-counters] but, the most serious problem, is data loss immediately after a read. Are secondary indexes involved? There are various bugs (including in 1.0.7) which have similar symptoms.. =Rob
NPE during compaction in compare
Hey Chris, so I just tried dropping all my data and converting my column families to use leveled compaction. Now I'm getting exceptions like the following once I start inserting data. Have you seen these? ERROR 13:13:25,616 Exception in thread Thread[CompactionExecutor:34,1,main] java.lang.NullPointerException at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:69) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:396) at org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:205) at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:180) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:40) at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:51) at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:46) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:160) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) and ERROR 13:17:11,327 Exception in thread Thread[CompactionExecutor:45,1,main] java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:396) at org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:205) at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:180) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:40) at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:51) at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:46) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:160) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
Re: unable to compact large rows
It is pretty much every row that hits the large threshold. I don't think I can delete every row that hits that… you can see the db size in the stack trace, do you want a different type of size? On Jul 24, 2013, at 11:07 AM, Jason Wee peich...@gmail.com wrote: Would it possible to delete this row and reinsert this row? By the way, how large is that one row? Jason On Wed, Jul 24, 2013 at 9:23 AM, Paul Ingalls paulinga...@gmail.com wrote: I'm getting constant exceptions during compaction of large rows. In fact, I have not seen one work, even starting from an empty DB. As soon as I start pushing in data, when a row hits the large threshold, it fails compaction with this type of stack trace: INFO [CompactionExecutor:6] 2013-07-24 01:17:53,592 CompactionController.java (line 156) Compacting large row fanzo/tweets_by_id:352567939972603904 (153360688 bytes) incrementally ERROR [CompactionExecutor:6] 2013-07-24 01:18:12,496 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:6,1,main] java.lang.AssertionError: incorrect row data size 5722610 written to /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_id/fanzo-tweets_by_id-tmp-ic-1453-Data.db; correct is 5767384 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I'm not sure what to do or where to look. Help…:) Thanks, Paul
Re: sstable size change
Hi all, This morning I increased the SSTable size for one of my LCS via an alter command and saw at least one compaction run (I did not trigger a compaction via nodetool nor upgrades stables nor removing the .json file). But so far my data sizes appear the same at the default 5 MB (see below for output of ls –Sal as well as relevant portion of cfstats). Is this expected? I was hoping to see at least one file at the new 256 MB size I set. Thanks SSTable count: 4965 SSTables in each level: [0, 10, 112/100, 1027/1000, 3816, 0, 0, 0] Space used (live): 29062393142 Space used (total): 29140547702 Number of Keys (estimate): 195103104 Memtable Columns Count: 441483 Memtable Data Size: 205486218 Memtable Switch Count: 243 Read Count: 154226729 -rw-rw-r-- 1 cassandra cassandra 5247564 Jul 18 01:33 users-shard_user_lookup-ib-97153-Data.db -rw-rw-r-- 1 cassandra cassandra 5247454 Jul 23 02:59 users-shard_user_lookup-ib-109063-Data.db -rw-rw-r-- 1 cassandra cassandra 5247421 Jul 20 14:58 users-shard_user_lookup-ib-103127-Data.db -rw-rw-r-- 1 cassandra cassandra 5247415 Jul 17 13:56 users-shard_user_lookup-ib-95761-Data.db -rw-rw-r-- 1 cassandra cassandra 5247379 Jul 21 02:44 users-shard_user_lookup-ib-104718-Data.db -rw-rw-r-- 1 cassandra cassandra 5247346 Jul 21 21:54 users-shard_user_lookup-ib-106280-Data.db -rw-rw-r-- 1 cassandra cassandra 5247242 Jul 3 19:41 users-shard_user_lookup-ib-66049-Data.db -rw-rw-r-- 1 cassandra cassandra 5247235 Jul 21 02:44 users-shard_user_lookup-ib-104737-Data.db -rw-rw-r-- 1 cassandra cassandra 5247233 Jul 20 14:58 users-shard_user_lookup-ib-103169-Data.db From: sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, July 23, 2013 3:04 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: sstable size change Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered Yes. Let it compact and increase in size. On Tue, Jul 23, 2013 at 9:38 AM, Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com wrote: On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright kwri...@nanigans.commailto:kwri...@nanigans.com wrote: Can you elaborate on what you mean by let it take its own course organically? Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered? You see, when two (or more!) SSTables love each other very much, they sometimes decide they want to compact together.. But seriously, yes. If you force all existing SSTables to level 0, it is as if you just flushed them all. Level compaction then does a whole lot of compaction, using the active table size. =Rob
Re: sstable size change
what is output of show keyspaces from cassandra-cli, did you see the new value? Compaction Strategy: org.apache.cassandra.db.compaction.LeveledCompactionStrategy Compaction Strategy Options: sstable_size_in_mb: XXX From: Keith Wright kwri...@nanigans.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Wednesday, July 24, 2013 3:44 PM Subject: Re: sstable size change Hi all, This morning I increased the SSTable size for one of my LCS via an alter command and saw at least one compaction run (I did not trigger a compaction via nodetool nor upgrades stables nor removing the .json file). But so far my data sizes appear the same at the default 5 MB (see below for output of ls –Sal as well as relevant portion of cfstats). Is this expected? I was hoping to see at least one file at the new 256 MB size I set. Thanks SSTable count: 4965 SSTables in each level: [0, 10, 112/100, 1027/1000, 3816, 0, 0, 0] Space used (live): 29062393142 Space used (total): 29140547702 Number of Keys (estimate): 195103104 Memtable Columns Count: 441483 Memtable Data Size: 205486218 Memtable Switch Count: 243 Read Count: 154226729 -rw-rw-r-- 1 cassandra cassandra 5247564 Jul 18 01:33 users-shard_user_lookup-ib-97153-Data.db -rw-rw-r-- 1 cassandra cassandra 5247454 Jul 23 02:59 users-shard_user_lookup-ib-109063-Data.db -rw-rw-r-- 1 cassandra cassandra 5247421 Jul 20 14:58 users-shard_user_lookup-ib-103127-Data.db -rw-rw-r-- 1 cassandra cassandra 5247415 Jul 17 13:56 users-shard_user_lookup-ib-95761-Data.db -rw-rw-r-- 1 cassandra cassandra 5247379 Jul 21 02:44 users-shard_user_lookup-ib-104718-Data.db -rw-rw-r-- 1 cassandra cassandra 5247346 Jul 21 21:54 users-shard_user_lookup-ib-106280-Data.db -rw-rw-r-- 1 cassandra cassandra 5247242 Jul 3 19:41 users-shard_user_lookup-ib-66049-Data.db -rw-rw-r-- 1 cassandra cassandra 5247235 Jul 21 02:44 users-shard_user_lookup-ib-104737-Data.db -rw-rw-r-- 1 cassandra cassandra 5247233 Jul 20 14:58 users-shard_user_lookup-ib-103169-Data.db From: sankalp kohli kohlisank...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Tuesday, July 23, 2013 3:04 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: sstable size change Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered Yes. Let it compact and increase in size. On Tue, Jul 23, 2013 at 9:38 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright kwri...@nanigans.com wrote: Can you elaborate on what you mean by let it take its own course organically? Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered? You see, when two (or more!) SSTables love each other very much, they sometimes decide they want to compact together.. But seriously, yes. If you force all existing SSTables to level 0, it is as if you just flushed them all. Level compaction then does a whole lot of compaction, using the active table size. =Rob
Re: How to avoid inter-dc read requests
That does not measure what the servers are doing though. Track the number of reads per CF, it's exposed with nodetool cfstats and is in ops centre as well. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/07/2013, at 12:22 AM, Omar Shibli o...@eyeviewdigital.com wrote: I simply monitor the load avg of the nodes using opscenter. I started with idle nodes (by idle I mean load avg of all nodes 1.0), then started to run a lot of key slice read requests on analytic DC with CL local quorum (I also made sure that the client worked with only with analytic DC), after a few minutes I noticed that the load avg of all the nodes increased dramatically (10). Thanks in Advance Aaron, On Tue, Jul 23, 2013 at 12:02 PM, aaron morton aa...@thelastpickle.com wrote: All the read/write request are issued with CL local quorum, but still there're a lot of inter-dc read request. How are you measuring this ? Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/07/2013, at 8:41 AM, sankalp kohli kohlisank...@gmail.com wrote: Slice query does not trigger background read repair. Implement Read Repair on Range Queries On Sun, Jul 21, 2013 at 1:40 PM, sankalp kohli kohlisank...@gmail.com wrote: There can be multiple reasons for that 1) Background read repairs. 2) Your data is not consistent and leading to read repairs. 3) For writes, irrespective of the consistency used, a single write request will goto other DC 4) You might be running other nodetools commands like repair. read_repair_chance¶ (Default: 0.1 or 1) Specifies the probability with which read repairs should be invoked on non-quorum reads. The value must be between 0 and 1. For tables created in versions of Cassandra before 1.0, it defaults to 1. For tables created in versions of Cassandra 1.0 and higher, it defaults to 0.1. However, for Cassandra 1.0, the default is 1.0 if you use CLI or any Thrift client, such as Hector or pycassa, and is 0.1 if you use CQL. On Sun, Jul 21, 2013 at 10:26 AM, Omar Shibli o...@eyeviewdigital.com wrote: One more thing, I'm doing a lot of key slice read requests, is that supposed to change anything? On Sun, Jul 21, 2013 at 8:21 PM, Omar Shibli o...@eyeviewdigital.com wrote: I'm seeing a lot of inter-dc read requests, although I've followed DataStax guidelines for multi-dc deployment http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers Here is my setup: 2 data centers within the same region (AWS) Targeting DC, RP 3, 6 nodes Analytic DC, RP 3, 11 nodes All the read/write request are issued with CL local quorum, but still there're a lot of inter-dc read request. Any suggestion, or am I missing something? Thanks in advance,
Re: NPE in CompactionExecutor
There was no error stack, just that line in the log. It's odd that the stack is not there. This is an unhanded exception when running compaction. It may be related to the assertions. If you can reproduce it please raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/07/2013, at 3:50 AM, Paul Ingalls paulinga...@gmail.com wrote: I'm running the latest from the 1.2 branch as of a few days ago. I needed one of the patches that will be in 1.2.7 There was no error stack, just that line in the log. I wiped the database (deleted all the files in the lib dir) and restarted my data load, and am consistently running into the incorrect row data size error, almost immediately… It seems to be specific to compacting large rows. I have been unsuccessful in getting a large row to compact… Paul On Jul 21, 2013, at 1:42 PM, aaron morton aa...@thelastpickle.com wrote: What version are you running ? ERROR [CompactionExecutor:38] 2013-07-19 17:01:34,494 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:38,1,main] java.lang.NullPointerException What' the full error stack ? Not sure if this is related or not, but I'm also getting a bunch of AssertionErrors as well, even after running a scrub… ERROR [CompactionExecutor:38] 2013-07-19 17:01:06,192 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:38,1,main] java.lang.AssertionError: incorrect row data size 29502477 written to /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_team/fanzo-tweets_by_team-tmp-ic-5262-Data.db; correct is 29725806 Double check that the scrub was successful. If it's not detecting / fixing the problem look for previous log messages from that thread [CompactionExecutor:38] and see what sstables it was compacting. Try remove those. But I would give scrub another chance to get it sorted. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/07/2013, at 5:04 AM, Paul Ingalls paulinga...@gmail.com wrote: I'm seeing a number of NullPointerExceptions in the log of my cluster. You can see the log line below. I'm thinking this is probably bad. Any ideas? ERROR [CompactionExecutor:38] 2013-07-19 17:01:34,494 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:38,1,main] java.lang.NullPointerException Not sure if this is related or not, but I'm also getting a bunch of AssertionErrors as well, even after running a scrub… ERROR [CompactionExecutor:38] 2013-07-19 17:01:06,192 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:38,1,main] java.lang.AssertionError: incorrect row data size 29502477 written to /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_team/fanzo-tweets_by_team-tmp-ic-5262-Data.db; correct is 29725806 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)
Re: funnel analytics, how to query for reports etc.
Too bad Rainbird isn't open sourced yet! It's been 2 years, I would not hold your breath. Remembered there are two time series open source projects out there https://github.com/deanhiller/databus https://github.com/Pardot/Rhombus Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/07/2013, at 4:00 AM, S Ahmed sahmed1...@gmail.com wrote: Thanks Aaron. Too bad Rainbird isn't open sourced yet! On Tue, Jul 23, 2013 at 4:48 AM, aaron morton aa...@thelastpickle.com wrote: For background on rollup analytics: Twitter Rainbird http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 Acunu http://www.acunu.com/ Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/07/2013, at 1:03 AM, Vladimir Prudnikov v.prudni...@gmail.com wrote: This can be done easily, Use normal column family to store the sequence of events where key is session #ID identifying one use interaction with a website, column names are TimeUUID values and column value id of the event (do not write something like user added product to shopping cart, something shorter identifying this event). Then you can use counter column family to store counters, you can count anything, number of sessions, total number of events, number of particular events etc. One row per day for example. Then you can retrieve this row and calculate all required %. On Sun, Jul 21, 2013 at 1:05 AM, S Ahmed sahmed1...@gmail.com wrote: Would cassandra be a good choice for creating a funnel analytics type product similar to mixpanel? e.g. You create a set of events and store them in cassandra for things like: event#1 user visited product page event#2 user added product to shopping cart event#3 user clicked on checkout page event#4 user filled out cc information event#5 user purchased product Now in my web application I track each user and store the events somehow in cassandra (in some column family etc) Now how will I pull a report that produces results like: 70% of people added to shopping cart 20% checkout page 10% filled out cc information 4% purchased the product And this is for a Saas, so this report would be for thousands of customers in theory. -- Vladimir Prudnikov
Re: high write load, with lots of updates, considerations? tomestombed data coming back to life
I was watching some videos from the C* summit 2013 and I recall many people saying that if you can some up with a design where you don't preform updates on rows, that would make things easier (I believe it was because there would be less compaction). No entirely true. There will always be compaction. But if you do updates there are overwrites which means there is data on disk that is irrelevant and is not released until compaction get's to those files. Could old tomestombed data somehow come back to life? I forget what scenerio brings about old data (kinda scary!). If you don't run repair on every node every gc_grace_seconds there is a chance of it happening. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/07/2013, at 4:22 AM, S Ahmed sahmed1...@gmail.com wrote: I was watching some videos from the C* summit 2013 and I recall many people saying that if you can some up with a design where you don't preform updates on rows, that would make things easier (I believe it was because there would be less compaction). When building an Analytics (time series) app on top of C*, based on Twitters Rainbird design (http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011), this means there will be lots and lots of counters. With lots of counters (updates), admin wise, what are some things to consider? Could old tomestombed data somehow come back to life? I forget what scenerio brings about old data (kinda scary!).
Re: get all row keys of a table using CQL3
I guess my question #1 still there, that does this query create a big load on the initial node that receive such request because it still has to wait for all the result coming back from other nodes before returning to client? sort of. The coordinator always has to wait. Only one node will return the actual data, the others will return a digest of the data. So there is not a huge memory pressure for this type of read. In general though you should page the results to reduce the size of the read. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/07/2013, at 5:57 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi Blake, arh okay, token function is nice. But I am still bit confused by the word page through all rows select id from mytable where token(id) token(12345) it will return all rows whose partition key's corresponding token that is 12345 ? I guess my question #1 still there, that does this query create a big load on the initial node that receive such request because it still has to wait for all the result coming back from other nodes before returning to client? thanks On Tue, Jul 23, 2013 at 10:34 PM, Blake Eggleston bl...@grapheffect.com wrote: Hi Jimmy, Check out the token function: http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results You can use it to page through your rows. Blake On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote: hi, I want to fetch all the row keys of a table using CQL3: e.g select id from mytable limit 999 #1 For this query, does the node need to wait for all rows return from all other nodes before returning the data to the client(I am using astyanax) ? In other words, will this operation create a lot of load to the initial node receiving the request? #2 if my table is big, I have to make sure the limit is set to a big enough number, such that I can get all the result. Seems like I have to do a count(*) to be sure is there any alternative(always return all the rows)? #3 if my id is a timeuuid, is it better to combine the result from couple of the following cql to obtain all keys? e.g select id from mytable where id t minTimeuuid('2013-02-02 10:00+') limit 2 + select id from mytable where id t maxTimeuuid('2013-02-02 10:00+') limit 2 thanks
Re: MapReduce response time and speed
Is it possible to reduce the response time (by tuning, adding more nodes) to make a result available within a couple of minutes? Or will there most certainly be a gap of 10 minutes or so and more? Yes. More nodes will split the task up and it will run faster. How long it takes depends on the complexity of the hadoop tasks and the time they have to wait for slots. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 25/07/2013, at 4:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: You have lot of questions there so I can't answer all but for the following: Can a user of the system define new jobs in an ad-hoc fashion (like a query) or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a developer to compile-in the job when you need the perormance of Erlang-based jobs). Suppose a user indeed can specify a job and send it off to Cassandra for processing, what is the expected response time? You can use high-level tools like Pig, Hive and Oozie But mind you, it will depend on your data size, complexity of the job, cluster and tune parameters. Regards, Shahab On Wed, Jul 24, 2013 at 10:33 AM, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi, I am Jan Algermissen (REST-head, freelance programmer/consultant) and Cassandra-newbie. I am looking at Cassandra for an application I am working on. There will be a max. of 10 Million items (Texts and attributes of a retailer's products) in the database. There will occasional writes (e.g. price updates). The use case for the application is to work on the whole data set, item by item to produce 'exports'. It will be neccessary to access the full set every time. There is no relationship between the items. Processing is done iteratively. My question: I am thinking that this is an ideal scenario for map-reduce but I am unsure about two things: Can a user of the system define new jobs in an ad-hoc fashion (like a query) or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a developer to compile-in the job when you need the perormance of Erlang-based jobs). Suppose a user indeed can specify a job and send it off to Cassandra for processing, what is the expected response time? Is it possible to reduce the response time (by tuning, adding more nodes) to make a result available within a couple of minutes? Or will there most certainly be a gap of 10 minutes or so and more? I understand that map-reduce is not for ad-hoc 'querying', but my users expect the system to feel quasi-ineractive, because they intend to refine the processing job based on the results they get. A short gap would be ok, but a definite gap in the order of 10+ minutes not. (For example, as far as I learned with RIAK you would most certainly have such a gap. How about Cassandra? Throwing more nodes at the problem would be ok, I just need to understand whether there is a definite 'response time penalty' I have to expect no matter what) Jan
Re: Data disappear immediately after reading?
What sort of read are you making to get the data ? There was a bug about secondary indexes being dropped if TTL was used https://issues.apache.org/jira/browse/CASSANDRA-5079 Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 25/07/2013, at 8:36 AM, cbert...@libero.it wrote: Sorry I forgot to tell Apache Cassandra 1.07 on Ubuntu 10.04 The data that are disappearing are not Counters but common Rows Messaggio originale Da: cbert...@libero.it Data: 24/07/2013 22.34 A: user@cassandra.apache.org Ogg: Data disappear immediately after reading? Hi all, I know the subject is not saying much but this is what I'm experiencing now with my cluster. After some years without any problem now I'm experiencing problems with counters but, the most serious problem, is data loss immediately after a read. I have some webservices that I use to query data on Cassandra but in the last month happened 2 times the following problem: I call my WS, it shows data. I refresh the page -- data are no more available! I can call then 200 times the WS but I won't see data anymore ... today my colleague experienced the same problem. The WS are ABSOLUTELY read only on the DB and there are no write to erase these data. Anyone understand wth is going on? I have no idea but most of all I don't know how to fix. Any help would really be appreciated. Kind Regards, Carlo