Re: How is Cassandra being used?
On Wed, Nov 16, 2011 at 10:02 AM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like the consensus is that if this is a good idea at all, it needs to be opt-in. Like I said earlier, I can live with that. In addition, if you want to get data from large companies that manage their own datacenters, there needs to be a way to contribute data without the software phoning home automatically. We aren't allowed to make connections to the outside world from our datacenter. And I'm not willing to ask for an exception for this. A mode that dumps the data to a file which can be uploaded would be preferable. People probably won't do it often, but imagine if your periodic how are you using cassandra? email threads included data? -ryan
Re: Cassandra Pig with network topology and data centers.
It'd be great if we had different settings for inter- and intra-DC read repair. -ryan On Fri, Jul 29, 2011 at 5:06 PM, Jake Luciani jak...@gmail.com wrote: Yes it's read repair you can lower the read repair chance to tune this. On Jul 29, 2011, at 6:31 PM, Aaron Griffith aaron.c.griff...@gmail.com wrote: I currently have a 9 node cassandra cluster setup as follows: DC1: Six nodes DC2: Three nodes The tokens alternate between the two datacenters. I have hadoop installed as tasktracker/datanodes on the three cassandra nodes in DC2. There is another non cassandra node that is used as the hadoop namenode / job tracker. When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read consistency I am seeing network and cpu spikes on the nodes in DC1. I was not expecting any impact on those nodes when local quorum is used. Can read repair be causing the traffic/cpu spikes? The replication settings for DC1 is 5, and for DC2 is 1. When looking at the map tasks I am seeing input splits for computers in both data centers. I am not sure what this means. My thought is that is should only be getting data from the nodes in DC2. Thanks Aaron
Re: set rpc_timeout_in_ms via jmx?
On Sat, Jul 16, 2011 at 12:30 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: I don't see a way in DatabaseDescriptor to set the rpc_timeout_in_ms via jmx. It doesn't seem possible right now. Is there any reason why that couldn't be set via jmx? It seems like a rolling restart to update that is pretty heavy. It would be nice to set it in the yaml and set it via jmx so it wouldn't require restart to take effect immediately. Jeremy btw I'm trying to do that with my analytics nodes - hadoop jobs fail when cycling a single cassandra node - might be 2388 I guess. There's no way to do this currently, but we'd be interested in having it. Ideally, the timeout would be configurable per request so that mixed workloads can have different timeouts. -ryan
Re: Cassandra 1.0
I think maybe 4 months was too short? Do we optimistically want to try that again or plan on taking a bit more time? Either way I'm happy to have a plan. :) -ryan On Thu, Jun 16, 2011 at 9:11 AM, Jonathan Ellis jbel...@gmail.com wrote: +1 On Thu, Jun 16, 2011 at 7:36 AM, Sylvain Lebresne sylv...@datastax.com wrote: Ladies and Gentlemen, Cassandra 0.8 is now out and we'll hopefully soon have the first minor release on that branch out too. It is now time to think of the next iteration, aka Apache Cassandra 1.0 (sounds amazing...). The 0.8 release was our first release on our new fixed 4 months release schedule. 0.7.0 was released January 9th and 0.8.0 was release just 4 months later, June 8th. Alright, alright, that's 5 months, but close enough for a first time. Sticking to that 4 months schedule, I propose the following deadlines: - September 8th: feature freeze - October 8th: release (tentative date) -- Sylvain -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
dapper-style tracing in cassandra
I'll open a ticket on this soon, but I'd like to start a discussion first. We're working on a distributed tracing system, whose design is somewhat inspired by the Google Dapper paper [1]. We have instrumented a bunch of our internal services through our custom networking stack [2]. In a nutshell, the way it works is that each request is given a trace id which gets passed through to each service involved in servicing that request. Each hop in that tree is given a span id. Each node logs its data to a local agent (we use scribe for this). An aggregator can pull the pieces back together so you can do analysis. I'd like to add the ability to plug tracers into cassandra. Like with many things in cassandra, I think like many parts of Cassandra we should make this an extensible point with a good default implementation in place. Here's what I propose: 1. Update the thrift server to allow clients to pass in tracing details. I'll have docs soon on how we're doing this internally. 2. Add the necessary metadata to each message passed between cassandra nodes. This should be easy to Message.java and thread through to the places we need it. 3. Implement a universally useful version of this– one that's not dependent on our system since it may not ever get open-sourced. Perhaps writing to local files? Thoughts? Opinions? -ryan 1. http://research.google.com/pubs/pub36356.html 2. https://github.com/twitter/finagle/tree/master/finagle-b3
Re: dapper-style tracing in cassandra
On Tue, Jun 14, 2011 at 2:02 PM, Jonathan Ellis jbel...@gmail.com wrote: Sounds a lot like https://issues.apache.org/jira/browse/CASSANDRA-1123. The main change you'd want is to allow passing an external trace ID. Yeah, that patch seems like a good start. In addition to passing an external trace id we need a way to plug in our our implementation of what to do with the data (we want to publish thrift structs through scribe). -ryan
Re: Updating cassandra RubyGem for 0.8 CQL
This is awesome. I'll work to get it merged. -ryan On Wed, Apr 27, 2011 at 8:36 PM, Robert Jackson robe...@promedicalinc.com wrote: I have just finished these updates. The following changes/new features have been made: * Update Rakefile to install 0.6.13, 0.7.4, 0.8.0-beta1 to ~/cassandra/cassandra-VERSION * Add data:load task to Rakefile for creating the schema required for the tests * Default the Rakefile to use 0.8-beta1 * Setup test suite to work on 0.6.13, 0.7.4, and 0.8.0-beta1 * All tests pass for all supported (0.6.13, 0.7.4, 0.8.0-beta1) versions. * Added Support for 0.8-beta1 * Changed get_index_slices to return a hash of rows * Updated Cassandra::Mock to pass all tests for each Cassandra version Please review my changes at: https://github.com/rjackson/cassandra I have submitted a pull request to the main fauna/cassandra repo on Github. The next round of updates will be to add an additional version for CQL. Robert Jackson - Original Message - From: Robert Jackson robe...@promedicalinc.com To: client-dev@cassandra.apache.org Sent: Saturday, April 23, 2011 12:58:03 AM Subject: Updating cassandra RubyGem for 0.8 CQL I have been working on a local fork of the fauna/cassandra rubygem to add support for 0.8. I am a relative newcomer to Cassandra in general, and working on the internals of the client has really helped. To make sure that I didn't loose any ground with other versions of Cassandra, I updated the test suite so that it can run tests against 0.6, 0.7, and 0.8. This works by setting a CASSANDRA_VERSION env variable before calling the normal rake or bin/cassandra_helper scripts. Run the cassandra version with: CASSANDRA_VERSION=0.8 rake cassandra Then you can run the tests with: CASSANDRA_VERSION=0.8 rake I still have a ways to go with getting all the tests passing, but at this point 0.8 and 0.6 have around 6 failures and 8 errors. (I am struggling with an schema loading issue with 0.7.4.) I hope to have the tests all passing in the next couple of days, and hopefully we can get the changes pushed upstream. Then I am going to start fleshing out the CQL version (which hopefully shouldn't be such a moving target between Cassandra versions). I would certainly appreciate any feedback on my work so far. https://github.com/rjackson/cassandra/tree/cassandra_0.8 Robert Jackson
Re: Maintenance releases
On Fri, Feb 11, 2011 at 8:35 AM, Gary Dusbabek gdusba...@gmail.com wrote: I've been uncomfortable with the amount of features I perceive are going into our maintenance releases for a while now. I thought it would stop after we committed ourselves to having a more predictable major release schedule. But getting 0.7.1 out feels like it's taken a lot more effort than it should have. I wonder if part of the problem is that we've been committing destabilizing features into it? IMO, maintenance releases (0.7.1, 0.7.2, etc.) should only contain bug fixes and *carefully* vetted features. I've scanned down the list of 0.7.1 changes in CHANGES.txt and about half of them are features that I think could have stayed in trunk. I think we did this a lot with the early maintenance releases of 0.6 as well, probably in an effort to get features out *now* instead of waiting for an 0.7 that was not happening soon enough. We've decided to pick up the pace of our major release schedule (sticking to four months). I think maintaining this pace will be difficult if we continue to commit as many features into the minor releases as we have been. I'm willing to concede that I may have an abnormally conservative opinion about this. But I wanted to voice my concern in hopes we can improve the quality and delivery of our maintenance releases. I agree with you. We've tried both approaches and I believe that its clear that releasing features in maintenance releases leads to more pain and unpredictability. -ryan
Re: Does Ruby library returns the RowKey?
On Thu, Feb 10, 2011 at 2:17 AM, Joshua Partogi joshua.j...@gmail.com wrote: Hi, Does the Ruby library currently returns the RowKey during a row get? From what I am seeing it seems like it is only returning an OrderedHash of the columns. Would it be possible to return the RowKey, or it doesn't make sense to do so? Which method are you talking about? It doesn't make sense to return the row key on a get or get_slice, but does for multiget and company (which it should already). -ryan
Re: Monitoring Cluster with JMX
If you're using 0.7, I'd skip jmx and use the mx4j http interface then write scripts that convert the data to the format you need. -ryan On Wed, Feb 9, 2011 at 2:47 AM, Roland Gude roland.g...@yoochoose.com wrote: Unfortunately not, as the nagios JMX check expects a numeric return value and only allows for defining thresholds for issuing warnings or errors depending on that value. It does not allow for post processing the return values. roland Von: Aaron Morton [mailto:aa...@thelastpickle.com] Gesendet: Dienstag, 8. Februar 2011 21:32 An: dev@cassandra.apache.org Betreff: Re: Monitoring Cluster with JMX Can't you get the length of the list on the monitoring side of things ? aaron On 08 Feb, 2011,at 10:25 PM, Roland Gude roland.g...@yoochoose.com wrote: Hello, we are trying to monitor our cassandra cluster with Nagios JMX checks. While there are JMX attributes which expose the list of reachable/unreachable hosts, it would be very helpful to have additional numeric attributes exposing the size of these lists. This could be used to set thresholds (in Nagios monitoring) i.e. at least 3 hosts must be reachable before Nagios issues a warning. This is probably not hard to do and we are willing to implement/supply patches if someone could point us in the right direction on where to implement it. Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/http://www.yoochoose.com%3chttp:/www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- -@rk
Re: Proposal: fixed release schedule
On Thu, Jan 20, 2011 at 8:39 AM, Eric Evans eev...@rackspace.com wrote: On Wed, 2011-01-19 at 10:29 -0600, Jonathan Ellis wrote: On Tue, Jan 18, 2011 at 12:36 PM, Eric Evans eev...@rackspace.com wrote: The discussion seems to be petering out and I wonder if that means folks are still trying to wrap their heads around everything, or if we have consensus. If we're in agreement on 4 months between releases, and feature-freezing branches in the run-up, then that would leave us with say 7 weeks (give or take) to land everything in trunk that we expect in the next release (and I would think that at this point we'd at least have a good idea what that'd be). Sounds good. I've assigned to the Riptano/Datastax team the issues we can get to in OK, then I'm going to assume we have consensus on this. So again, we released on Jan 9th, 4 months (nominally) would give us a release date of May 9th. We need a few weeks for testing and bug fixing, say time enough for a couple beta iterations, so let's set a tentative date of April 9 to branch, (just under 7 weeks from now). As of right now I see 71 issues marked 0.8. I haven't been through all of them, some are trivial or have patches attached, but some are no doubt unrealistic considering the time-line. For any issues you're championing, please take some time over the next couple of weeks to make sure the ones marked fixfor-0.8 match what you can accomplish before we branch. Is that reasonable to everyone? Seems reasonable to me, though I think the release date can be a bit more flexible (while the freeze date shouldn't be). In other words, if we feature freeze and branch on April 9th, then we're ready to ship before May 9th, we should just go ahead and ship. I'm guessing that we'll have to cut a bunch of scope in order to make this happen. -ryan
Re: Time for 1.0
On Thu, Jan 13, 2011 at 7:32 PM, Jonathan Ellis jbel...@gmail.com wrote: ... In other words, at some point you have so many production users that it's silly to pretend it's ready for 1.0. I'd say we've passed that point. Did you mean to say silly to pretend it's *not* ready for 1.0? Otherwise, I don't understand. I'm on board with this, to the point that Riptano is hiring a full-time QA engineer to contribute here. Like I said at the outset, I don't care so much about what the version is called as long as the quality continues to improve. -ryan
Re: Time for 1.0
I'm a -1 on naming the next release 1.0 because I don't think it has the quality that 1.0 implies, but to be honest I don't really care that much. The version numbers don't really effect those that of use that are running production clusters. Calling it 1.0 won't make it any more stable or faster. Also, before we say that everything people want in 1.0 is done, perhaps we need to do that survey again. A lot of people have joined the community since 0.5 days and their needs should probably be considered in this situation. Also, those of use who've been around have new things we care about. Of course this will always be true and at some point we need to draw a line in the sand and put the 1.0 stamp on it– I just feel that that time has not come yet (but, like I said I don't really care that much because it won't affect me). Regardless of what we call the next major release there's at least 2 things I'd like to see happen: 1. make the distributed test suite more reliable (its admittedly flaky on ec2) and flesh it out to include all distributed functionality. We shouldn't run a distributed system without distributed tests. We'll work on the flakiness, but we need people to write tests (and reviewers to require tests). 2. I think we should change how we plan releases. I'll send another email about this soon. -ryan On Tue, Jan 11, 2011 at 5:35 PM, Jonathan Ellis jbel...@gmail.com wrote: Way back in Nov 09, we did a users survey and asked what features people wanted to see. Here was my summary of the responses: http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html Looking at that, we've done essentially all of them. I think we can make a strong case that our next release should be 1.0; it's production ready, it's reasonably feature-complete, it's documented, and we know what our upgrade path story is. The list-- Load balancing: basics done; https://issues.apache.org/jira/browse/CASSANDRA-1427 is open to improve it Decommission: done Map/reduce support: done ColumnFamily / Keyspace definitions w/o restart: done Design documentation: started at http://wiki.apache.org/cassandra/ArchitectureInternals Insert multiple rows at once: done Remove_slice_range / remove_key_range: turned out to be a *lot* harder than it looks at first. Postponed indefinitely. Secondary indexing: done Caching: done (with some enhancements possible such as https://issues.apache.org/jira/browse/CASSANDRA-1969 and https://issues.apache.org/jira/browse/CASSANDRA-1956) Bulk delete (truncate): done I would add, User documentation: done (http://www.riptano.com/docs) Large row support: done Improved replication strategies and more sophisticated ConsistencyLevels: done Efficient bootstrap/streaming: done Flow control: done Network-level compatibility between releases: scheduled (https://issues.apache.org/jira/browse/CASSANDRA-1015) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Proposal: fixed release schedule
I think many believe that shipping 0.7 took longer than it should. Rather than going into why that happened, I'd like to propose a better way to move forward that will hopefully allow us to ship on a more predictable schedule. This proposal is heavily influenced by the google chrome release process: http://www.scribd.com/doc/46659928/Chrome-Release-Cycle-12-16-2010' ..which is heavily influenced by how large websites deploy code (everyone close to trunk, hide incomplete changes behind configuration flags, etc.) I'm not saying we should adopt this process as-is, but some aspects of it seem like they would be valuable– # Fixed schedule We should set a fixed schedule and stick to it. Anything features not ready at branch time won't make it and will be disabled in the stable branch. # Trunk-first Everyone on chrome commits to trunk first. I think the important change we could make is to keep everyone closer to trunk. We spend a good deal of effort back-porting patches between major versions. I think we should make the major versions less different. This would mean letting them live for shorter amounts of time and possibly making them bugfix only. Currently we add new features in stable branches, but I think if we made the major release schedule more predictable people would be more comfortable with letting their new feature wait until the next major version. We should be more liberal about committing things to trunk early and iterating on them there (rather than iterating on them in patches). If the features are unstable we can either hide them behind configuration flags or remove them when we cut a stable branch. # Automate all tests I think the only way that we can keep people close to trunk and stay stable is to build automated tests for *everything*. All code should be exercised by thorough unit tests and distributed black-box tests. Every regression should get a test. Chrome has a 6 week cycle. I think ours would be more like 4 months for major releases. Whatever we do, I think the schedule needs to be more predictable, which means that the contents of each release will be less predictable (since its whatever's ready at the appointed time). Like the Chrome presentation mentioned the idea isn't raw speed, but predictable release schedules. Feedback please. -ryan
Re: Proposal: fixed release schedule
To be more clear, here's what I think is broken in the current release planning: 1. The dates are wildly unpredictable 2. People aren't allowed to work against trunk on features for multiple iterations (see #1072) 3. Stable branches diverge too much, causing duplicated effort. (we essentially implemented #1072 twice for 0.6 and 0.7) 4, back porting features is risky and causes bugs, esp with the limited QA available -ryan On Thu, Jan 13, 2011 at 2:32 PM, Ryan King r...@twitter.com wrote: I think many believe that shipping 0.7 took longer than it should. Rather than going into why that happened, I'd like to propose a better way to move forward that will hopefully allow us to ship on a more predictable schedule. This proposal is heavily influenced by the google chrome release process: http://www.scribd.com/doc/46659928/Chrome-Release-Cycle-12-16-2010' ..which is heavily influenced by how large websites deploy code (everyone close to trunk, hide incomplete changes behind configuration flags, etc.) I'm not saying we should adopt this process as-is, but some aspects of it seem like they would be valuable– # Fixed schedule We should set a fixed schedule and stick to it. Anything features not ready at branch time won't make it and will be disabled in the stable branch. # Trunk-first Everyone on chrome commits to trunk first. I think the important change we could make is to keep everyone closer to trunk. We spend a good deal of effort back-porting patches between major versions. I think we should make the major versions less different. This would mean letting them live for shorter amounts of time and possibly making them bugfix only. Currently we add new features in stable branches, but I think if we made the major release schedule more predictable people would be more comfortable with letting their new feature wait until the next major version. We should be more liberal about committing things to trunk early and iterating on them there (rather than iterating on them in patches). If the features are unstable we can either hide them behind configuration flags or remove them when we cut a stable branch. # Automate all tests I think the only way that we can keep people close to trunk and stay stable is to build automated tests for *everything*. All code should be exercised by thorough unit tests and distributed black-box tests. Every regression should get a test. Chrome has a 6 week cycle. I think ours would be more like 4 months for major releases. Whatever we do, I think the schedule needs to be more predictable, which means that the contents of each release will be less predictable (since its whatever's ready at the appointed time). Like the Chrome presentation mentioned the idea isn't raw speed, but predictable release schedules. Feedback please. -ryan
Re: Proposal: fixed release schedule
On Thu, Jan 13, 2011 at 4:04 PM, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Jan 13, 2011 at 2:32 PM, Ryan King r...@twitter.com wrote: # Fixed schedule We should set a fixed schedule and stick to it. Anything features not ready at branch time won't make it and will be disabled in the stable branch. I like this idea, as long as we're willing to be flexible when warranted. Sometimes it is less work to finish a feature, than to rip it out. Two things: First, I think a key part of how you make this successful (both for Chome and for continuously deployed software like large services) is that non-trivial changes almost always have to be hidden behind flags until they're ready for wide use. Second, I think this will only work well if we are somewhat strict about it. # Trunk-first Everyone on chrome commits to trunk first. I suppose that's fine if it works for them, but it's not The One True Way. Changes that affect both stable and trunk branches should really be applied to stable first and merged forward. Here is a good presentation explaining why: http://video.google.com/videoplay?docid=-577744660535947210. Another reason is that committing fixes to a stable branch and then using svn merge branch from trunk means svn tracks everything that has been committed to branch and not yet to trunk, and merges it in. So it protects us somewhat against people committing a fix to trunk, then forgetting to commit to the stable branch. I guess I don't care as much about the mechanics of this as the intent– which is to keep stable and trunk closer together. And to keep people working on a more common base. I think the important change we could make is to keep everyone closer to trunk. We spend a good deal of effort back-porting patches between major versions. I think we should make the major versions less different. This would mean letting them live for shorter amounts of time and possibly making them bugfix only. In theory I agree (see: policy for 0.4 and 0.5 stable releases). In practice, users overwhelmingly wanted more than that in between major releases. Not that users are always right, but this is an area where I think they are worth listening to. :) Perhaps minor things are worth adding in a stable branch. I think this is an area where judgement can come into play. I think if we made the major release schedule more predictable people would be more comfortable with letting their new feature wait until the next major version. In my experience it's not the unpredictability as much as I'm feeling this pain Right Now and four months is too long to wait. Perhaps waiting is the biggest pain for users, but for developers unpredictability is just as big a problem. If I don't know when release N+1 is going to happen I might try to hurry to get my feature into release N. If I have confidence that release N+1 will come promptly at a scheduled time I can set my expectation appropriately. We should be more liberal about committing things to trunk early and iterating on them there (rather than iterating on them in patches). I agree in the sense that we were too slow to branch 0.7 to have an open trunk to start work on. But I disagree in the sense that we shouldn't be committing works-in-progress to trunk because that becomes the baseline everyone else has to develop from. (I know at least one team with a nontrivial patchset against trunk from the 0.7 beta1 timeframe, back when it had the Clock struct that we committed prematurely.) So we made a mistake once. :) I think committing large changes in smaller pieces will be a net positive, even if it occasionally trips us up. For example, I think work on counters between our team and Sylvain has improved dramatically once we committed 1072. IMO the right fix is to help the ASF make git an option; in the meantime the best workaround is a git-based workflow with git-jira-attacher and git-am as described in http://spyced.blogspot.com/2009/06/patch-oriented-development-made-sane.html and http://wiki.apache.org/cassandra/GitAndJIRA. So you're proposing that we use git to keep long-running feature branches? # Automate all tests I think the only way that we can keep people close to trunk and stay stable is to build automated tests for *everything*. All code should be exercised by thorough unit tests and distributed black-box tests. Every regression should get a test. Agreed. Chrome has a 6 week cycle. I think ours would be more like 4 months for major releases. Four months feels about right to me, too, although for 0.7 + 1 I'd like to make it a bit shorter (beginning of April?) since we have several features (1072 being the most prominent) that just barely missed 0.7. Like I said, we're going to have to figure out the right pace, but we should try and stick to it. Whatever we do, I think the schedule needs to be more predictable, which means that the contents of each release will be less
Re: [VOTE] 7.0
+1 non-binding -ryan On Thu, Jan 6, 2011 at 10:24 AM, Jonathan Ellis jbel...@gmail.com wrote: +1 for reals On Jan 6, 2011 11:14 AM, Eric Evans eev...@rackspace.com wrote: RC 4 seems to be holding up OK, shall we? I propose the following for release as 0.7.0 (aka For Reals Yo). SVN: https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0@r1055934 0.7.0 artifacts: http://people.apache.org/~eevans The vote will be open for at least 72 hours. P.S. Don't forget that there is still a vote open for 0.6.9 [1]: http://goo.gl/uT89p (CHANGES.txt) [2]: http://goo.gl/Bi8LD (NEWS.txt) [3]: http://goo.gl/MHe1z (True Grit(s)) -- Eric Evans eev...@rackspace.com
Re: Coordinated testing for 0.7
I'd be happy to host a hackathon at Twitter HQ in SF for this. Anyone interested in that? -ryan On Wed, Dec 1, 2010 at 7:18 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Perhaps the time could be better spent trying to beef up the integration tests and looking for ways to root out potential regressions... Back in September a handful of us in the Austin/San Antonio area did an Avro hackathon to get functional parity between thrift and avro. I wonder if there could be a day set aside to do something to contribute to testing out 0.7 - unit/integration test additions would be beneficial long-term. Anyway, it could be coordinated by one or a small number of people so that there isn't duplication - something like that. I know several have spent long hours already making it solid. Just trying to brainstorm ways to get some additional good contributions in the core for making for a more solid 0.7.0 release. Again... any thoughts? On Dec 1, 2010, at 3:24 PM, Jeremy Hanna wrote: I was wondering if there was a coordinated plan for testing the 0.7 release. I realize that testing is ultimately up to the individual team. However, with 0.7 there are a _lot_ of significant changes and I wondered if there was interest in coordinating efforts to do more extensive testing above and beyond the integration tests and things built into the source tree currently. I think https://issues.apache.org/jira/browse/CASSANDRA-874 is also relevant for furthering the integration tests. Any thoughts?
Distributed Counters Use Cases
In the spirit of making sure we have clear communication about our work, I'd like to outline the use cases Twitter has for distributed counters. I expect that many of you using Cassandra currently or in the future will have similar use cases. The first use case is pretty simple: high scale counters. Our Tweet button [1] is powered by #1072 counters. We could every mention of every url that comes through a public tweet. As you would expect, there are a lot of urls and a lot of traffic to this widget (its on many high traffic sites, though it is highly cached). The second is a bit more complex: time series data. We have built infrastructure that can process logs (in real time from scribe) or other events and convert them into a series of keys to increment, buffer the data for 1 minute and increment those keys. For logs, each aggregator would do its on increment (so per thing you're tracking you get an increment for each aggregator), but for events it'll be one increment per event. We plan to open source all of this soon. We're hoping to soon start replacing our ganglia clusters with this. For the ganglia use-case we end up with a large number or increments for every read. For monitoring data, even a reasonably sized fleet with a moderate number of metrics can generate a huge amount of data. Imagine you have 500 machines (not how many we have) and measure 300 (a reasonable estimate based on our experience) metrics per machine. Suppose you want to measure these things every minute and roll the values up every hour, day, month and for all time. Suppose also that you were tracking sum, count, min, max, and sum of squares (so that you can do standard deviation). You also want to track these metrics across groups like web hosts, databases, datacenters, etc. These basic assumptions would mean this kind of traffic: (500 + 100 ) * 300 * 5 * 4 3,600,000 increments/minute (machines groups)metrics time granularities aggregates Read traffic, being employee-only would be negligible compared to this. One other use case is that for many of the metrics we track, we want to track the usage across several facets. For example [2] to build our local trends feature, you could store a time series of terms per city. In this case supercolumns would be a natural fit because the set of facets is unknown and open: Imagine a CF that has data like this: city0 = hour0 = { term1 = 2, term2 = 1000, term3 = 1}, hour1 = { term5 = 2, term2 = 10} city1 = hour0 = { term12 = 3, term0 = 500, term3 = 1}, hour1 = { term5 = 2, term2 = 10} Of course, there are some other ways to model this data– you could collapse the subcolumn names into the column names and re-do how you slice (you have to slice anyway). You have to have fixed width terms then, though: city0 = { hour0 + term1 = 2, hour0 + term2 = 1000, hour0 + term3 = 1}, hour1 = { hour1 + term5 = 2, hour1 + term2 = 10} city1 = { hour0 + term12 = 3, hour0 + term0 = 500, hour0 + term3 = 1}, hour1 = { hour1 + term5 = 2, hour1 + term2 = 10} This is doable, but could be rough. The other option is to have a separate row for each facet (with a compound key of [city, term]), and build a custom comparator that only looks at the first part for generating the token, they we have to do range slices to get all the facets. Again, doable, but not pretty. -ryan 1. http://twitter.com/goodies/tweetbutton 2. this is not how we actually do this, but it would be a reasonable approach.
Re: [DISCUSSION] High-volume counters in Cassandra
Sorry, been catching up on this. From Twitter's perspective, 1546 is probably insufficient because it doesn't allow one to do time-series data without supercolumns (which might work ok, but require a good deal of work). Additionally, one of our deployed systems already does supercolumns of counters, which is not feasible in this design at all. -ryan On Tue, Sep 28, 2010 at 10:12 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Is there any feedback from Twitter and Digg and perhaps SimpleGeo people about CASSANDRA-1546? Would that work so that you wouldn't have to maintain a fork? On Sep 27, 2010, at 5:25 AM, Sylvain Lebresne wrote: In CASSANDRA-1546, I propose an alternative to #1072. At it's core, it rewrites #1072 without the clocks structure (by splitting the clock into individual columns, not unlike what Zhu Han proposed in his preceding mail, but in a row instead of a super column, for reason explained in the issue). But it is also my belief that it improves on the actual patch of #1072 in the following ways: - it supports increments and decrements - it supports the usual consistency levels - it proposes an (optional) solution to the idempotency problem of increments (it's optional because it has a (fairly slight) performance cost that some may want to remove if they understand the risk). When I say, I propose, I mean that I did wrote the patch (attached to the jira ticket). I've just written it, so it is really under-tested and have a few details here and there to fix, but it should already be fairly functional (it passes basic system tests). I welcome all comments on the patch. It has been written with in mind the goal to address most of the concerns that have been addressed on those counters since a few months (both in terms of performance and implementation). It is my belief that is reaches this goal, hopefully other will agree. -- Sylvain On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han schumi@gmail.com wrote: I propose a new way to solve the counter problem in cassandra-1502[1]. Since I do not follow the jira update very carefully, I paste it here and want to let more people comment it and then to see whether its feasible. Seems like we have not found a solution acceptable to everybody. I tries to propose a new approach. Let's see whether anybody can shed some light on it and make it as reality. 1) We add a basic data structure, called as counter, which is a special type of super column. 2) The name of each column in the counter super column, is the host name of a cassandra node. And the value is the calculated result from that node. 3) WRITE PATH: Once a node receives the add/dec request of a counter, it de-serializes its local counter super column, and update the column named by itself atomically. After that, it propagates the updated column value to other replicas, just like how the mutation of a normal column is propagated to other replicas. Different consistency levels can be supported as before. 4) READ PATH: Depends on the consistency level, contact several replicas, read back the counter super column as whole, and get the latest counter value by summing up all columns in the counter. Read-repair logic can work as before. IMHO, the biggest advantages of this approach, is re-using as many mechanisms already in the code as possible. So it might not so disruptive. But adding new thrift API is inevitable. NB: If it's feasible, I might not be the right man working on it as I have not touched the internal of cassandra for more than 1 year. I wants to contribute something to help us get consensus. [1] https://issues.apache.org/jira/browse/CASSANDRA-1502?focusedCommentId=12915103page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12915103 best regards, hanzhu On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis jbel...@gmail.com wrote: you have misunderstood. if we continue the 1072 approach of writing counter data to the clock field, this is necessarily incompatible with the right way of writing counter data to the value field. it's no longer simply a matter of reversing 1070. On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han schumi@gmail.com wrote: Jonathan, This is a personnel email. On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis jbel...@gmail.com wrote: On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han schumi@gmail.com wrote: Can we just let the patch committed but mark it as alpah or experimental? I explained exactly why that is not a good approach here: http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html Yes, I see. But the clock structure is in truck since Cassandra-1070. We still need to clean them out, whatever. We need somebody to be volunteer to take this work. Considering the complexity of Cassandra-1070, the programmer who has the in depth knowledge of this patch is preferable. And it will take some time to do it.
Re: Locking in cassandra
On Mon, Aug 16, 2010 at 6:07 AM, Maifi Khan maifi.k...@gmail.com wrote: Hi How is the locking implemented in cassandra? Say, I have 10 nodes and I want to write to 6 nodes which is (n+1)/2. Not to be too pedantic, but you're misunderstanding how to use cassandra. When we talk about 'n' we mean the number of replicas for a given piece of data, not the total number of nodes. If you have 10 nodes, you shouldn't be writing a piece of data to 6 of them. -ryan