Re: disappointed
Yeah, Rob is smart. don't run crap in production. Run what others are stable at. If you are running the latest greatest dumbest craziest in prod then you ask for fail, and you will get just that. FAIL On Jul 24, 2013, at 12:06 PM, Robert Coli rc...@eventbrite.com wrote: A better solution would likely involve not running cutting edge code in production.. if you find yourself needing to upgrade production anything on the day of a release, you are probably ahead of the version it is reasonable to run in production. If you're already comfortable with this high level of risk in production, I don't really see small manual patches as significantly increasing your level of risk... =Rob
Re: disappointed
Mysql? -- Colin +1 320 221 9531 On Jul 25, 2013, at 6:08 AM, Derek Andree dand...@lacunasystems.com wrote: Yeah, Rob is smart. don't run crap in production. Run what others are stable at. If you are running the latest greatest dumbest craziest in prod then you ask for fail, and you will get just that. FAIL On Jul 24, 2013, at 12:06 PM, Robert Coli rc...@eventbrite.com wrote: A better solution would likely involve not running cutting edge code in production.. if you find yourself needing to upgrade production anything on the day of a release, you are probably ahead of the version it is reasonable to run in production. If you're already comfortable with this high level of risk in production, I don't really see small manual patches as significantly increasing your level of risk... =Rob
RE: disappointed
Hi Paul, Sorry to hear you're having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I've had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We've just spent 3-5 months moving from Mongo to Cassandra. It's been expensive and painful getting Cassandra to read like Mongo, but we've made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1. I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1.. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul
Re: disappointed
Hi Paul, Concerning large rows which are not compacting, I've probably managed to reproduce your problem. I suppose you're using collections, but also TTLs ? Anyway, I opened an issue here : https://issues.apache.org/jira/browse/CASSANDRA-5799 Hope this helps 2013/7/24 Christopher Wirt chris.w...@struq.com Hi Paul, ** ** Sorry to hear you’re having a low point. ** ** We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. ** ** We only have fixed schema short rows so no experience with large row compaction. ** ** File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? ** ** Only time I’ve had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. ** ** ** ** From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We’ve just spent 3-5 months moving from Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read like Mongo, but we’ve made it J ** ** ** ** ** ** ** ** *From:* Paul Ingalls [mailto:paulinga...@gmail.com] *Sent:* 24 July 2013 06:01 *To:* user@cassandra.apache.org *Subject:* disappointed ** ** I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. ** ** First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. ** ** But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1… ** ** I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1…. ** ** Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? ** ** Paul -- Fabien Rousseau * * aur...@yakaz.comwww.yakaz.com
Re: disappointed
From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. Its not dangerous, just do not try to be smart and follow what other big cassandra users like twitter, netflix, facebook, etc are using. If they are still at 1.1, then do not rush to 1.2. You can get all informations you need from github and their maven repos. Same method can be used for any other not mainstream software like scala and hadoop. Also every cassandra new branch comes with extensive number of difficult to spot bugs and it takes about 1/2 year to stabilize. Usually new features should be avoided. Best is to stay 1 major version behind. This is true for almost any mission critical software. You can help with testing cassandra 2.0 beta. Create your testsuite and run it against your target cassandra version. Test suite also needs to track performance. From my testing performance of 2.0 is about same as 1.2 in my workload. I had lot of problems after i migrated from really good working 0.8.x to 1.0.5. Even if preproduction testing did not discovered any problems, there were memory leaks in 1.0.5, hint delivery was broken and there were problem with repair making old tombstones appear causing snowball effect. Last one was fixed about 1year later in mainstream C* after i fixed it myself because no dev believed me that such thing can happen.
Re: disappointed
Same type of error, but I'm not currently using TTL's. I am, however, generating a lot of tombstones as I add elements to collections…. On Jul 24, 2013, at 6:42 AM, Fabien Rousseau fab...@yakaz.com wrote: Hi Paul, Concerning large rows which are not compacting, I've probably managed to reproduce your problem. I suppose you're using collections, but also TTLs ? Anyway, I opened an issue here : https://issues.apache.org/jira/browse/CASSANDRA-5799 Hope this helps 2013/7/24 Christopher Wirt chris.w...@struq.com Hi Paul, Sorry to hear you’re having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I’ve had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We’ve just spent 3-5 months moving from Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read like Mongo, but we’ve made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1… I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1…. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul -- Fabien Rousseau www.yakaz.com
Re: disappointed
Hi Chris, Thanks for the response! What kind of challenges did you run into that kept you from using collections? I currently and running 4 physical nodes, same as I was with case 1.1.6. I'm using size tiered compaction. Would changing to level tiered with a large minimum make a big difference, or would it just push the problem off till later? Yeah, I have run into problems dropping schemas before as well. I was careful this time to start with an empty db folder… Glad you were successful in your transition…:) Paul On Jul 24, 2013, at 4:12 AM, Christopher Wirt chris.w...@struq.com wrote: Hi Paul, Sorry to hear you’re having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I’ve had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We’ve just spent 3-5 months moving from Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read like Mongo, but we’ve made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1… I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1…. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul
Re: disappointed
Hey Radim, I knew that it would take a while to stabilize, which is why I waited 1/2 a year before giving it a go. I guess I was just surprised that 6 months wasn't long enough… I'll have to look at the differences between 1.2 and 2.0. Is there a good resource for checking that? Your experience is less than encouraging…:) I am worried that if I stick with it, I'll have to invest time into learning the code base as well, and as a small startup time is our most valuable resource… Thanks for the thoughts! Paul On Jul 24, 2013, at 6:42 AM, Radim Kolar h...@filez.com wrote: From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. Its not dangerous, just do not try to be smart and follow what other big cassandra users like twitter, netflix, facebook, etc are using. If they are still at 1.1, then do not rush to 1.2. You can get all informations you need from github and their maven repos. Same method can be used for any other not mainstream software like scala and hadoop. Also every cassandra new branch comes with extensive number of difficult to spot bugs and it takes about 1/2 year to stabilize. Usually new features should be avoided. Best is to stay 1 major version behind. This is true for almost any mission critical software. You can help with testing cassandra 2.0 beta. Create your testsuite and run it against your target cassandra version. Test suite also needs to track performance. From my testing performance of 2.0 is about same as 1.2 in my workload. I had lot of problems after i migrated from really good working 0.8.x to 1.0.5. Even if preproduction testing did not discovered any problems, there were memory leaks in 1.0.5, hint delivery was broken and there were problem with repair making old tombstones appear causing snowball effect. Last one was fixed about 1year later in mainstream C* after i fixed it myself because no dev believed me that such thing can happen.
RE: disappointed
We found the performance of collections to not be great and needed a quick solution. We've always used the levelled compaction strategy where you declare a sstable_size_in_mb not min_compaction_threshold. Much better for our use case. http://www.datastax.com/dev/blog/when-to-use-leveled-compaction We are read-heavy latency sensitive people Lots of TTL'ing Few writes compared to reads. From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 17:43 To: user@cassandra.apache.org Subject: Re: disappointed Hi Chris, Thanks for the response! What kind of challenges did you run into that kept you from using collections? I currently and running 4 physical nodes, same as I was with case 1.1.6. I'm using size tiered compaction. Would changing to level tiered with a large minimum make a big difference, or would it just push the problem off till later? Yeah, I have run into problems dropping schemas before as well. I was careful this time to start with an empty db folder. Glad you were successful in your transition.:) Paul On Jul 24, 2013, at 4:12 AM, Christopher Wirt chris.w...@struq.com wrote: Hi Paul, Sorry to hear you're having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I've had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We've just spent 3-5 months moving from Mongo to Cassandra. It's been expensive and painful getting Cassandra to read like Mongo, but we've made it J From: Paul Ingalls [mailto:paulinga...@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1. I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1.. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul
Re: disappointed
Same thing here... Since #5677 seems to affect a lot of users what do you think about releasing a version 1.2.6.1? I can patch myself, yeah, but do I want to push this into production? Hmm... Am 24.07.2013 18:58, schrieb Paul Ingalls: Hey Radim, I knew that it would take a while to stabilize, which is why I waited 1/2 a year before giving it a go. I guess I was just surprised that 6 months wasn't long enough… I'll have to look at the differences between 1.2 and 2.0. Is there a good resource for checking that? Your experience is less than encouraging…:) I am worried that if I stick with it, I'll have to invest time into learning the code base as well, and as a small startup time is our most valuable resource… Thanks for the thoughts! Paul On Jul 24, 2013, at 6:42 AM, Radim Kolar h...@filez.com mailto:h...@filez.com wrote: From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. Its not dangerous, just do not try to be smart and follow what other big cassandra users like twitter, netflix, facebook, etc are using. If they are still at 1.1, then do not rush to 1.2. You can get all informations you need from github and their maven repos. Same method can be used for any other not mainstream software like scala and hadoop. Also every cassandra new branch comes with extensive number of difficult to spot bugs and it takes about 1/2 year to stabilize. Usually new features should be avoided. Best is to stay 1 major version behind. This is true for almost any mission critical software. You can help with testing cassandra 2.0 beta. Create your testsuite and run it against your target cassandra version. Test suite also needs to track performance. From my testing performance of 2.0 is about same as 1.2 in my workload. I had lot of problems after i migrated from really good working 0.8.x to 1.0.5. Even if preproduction testing did not discovered any problems, there were memory leaks in 1.0.5, hint delivery was broken and there were problem with repair making old tombstones appear causing snowball effect. Last one was fixed about 1year later in mainstream C* after i fixed it myself because no dev believed me that such thing can happen. -- Steffen Rusitschka CTO MegaZebra GmbH Steinsdorfstraße 2 81538 München Phone +49 89 80929577 r...@megazebra.com Challenge me at www.megazebra.com MegaZebra GmbH Geschäftsführer: Henning Kosmack, Christian Meister, Steffen Rusitschka Sitz der Gesellschaft: München, HRB 177947
Re: disappointed
On Wed, Jul 24, 2013 at 11:37 AM, Steffen Rusitschka r...@megazebra.comwrote: Same thing here... Since #5677 seems to affect a lot of users what do you think about releasing a version 1.2.6.1? I can patch myself, yeah, but do I want to push this into production? Hmm... A better solution would likely involve not running cutting edge code in production.. if you find yourself needing to upgrade production anything on the day of a release, you are probably ahead of the version it is reasonable to run in production. If you're already comfortable with this high level of risk in production, I don't really see small manual patches as significantly increasing your level of risk... =Rob
Re: disappointed
cas 2.0b2 https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.0.0-beta2-tentative and as a small startup time is our most valuable resource… use technology you are most familiar with.