Re: Welcome committer Johan Oskarsson
Thanks everyone. I'm really glad to see the project moving along at a healthy pace and that people are starting to use/test it in production environments. Keep up the good work! /Johan Jonathan Ellis wrote: The Cassandra PPMC has voted to add Johan Oskarsson as a committer to the Cassandra incubator project. Welcome, Johan -- or more correctly, thanks for your hard work! :) -Jonathan
Re: [VOTE] Website
+1. A great step forward from the current version and a good base to improve upon. /Johan Eric Evans wrote: The current website is quite ugly, and I don't know about you, but I'm itching to put the new project logo to use, so I'd like to propose publishing http://cassandra.deadcafe.org (to http://incubator.apache.org/cassandra). This is a slightly tweaked version of Daniel Lundin's work from CASSANDRA-231[1] (thanks Daniel!), and the content is nearly identical to what is on the current site. I do not consider this to be the final word on the matter, I think there is still much to be done. For example, the logo w/text is something I cobbled together in Gimp and should be considered a placeholder. Still, it's much better than what we currently have and we can incrementally improve it as we go forward. [1] https://issues.apache.org/jira/browse/CASSANDRA-231
Re: Graduation?
+1 on RTC for the reasons mentioned below. /Johan Jonathan Ellis wrote: On Thu, Nov 5, 2009 at 3:29 PM, ant elder ant.el...@gmail.com wrote: I think it could be tough to get Cassandra through a graduation vote on general@ while working with RTC. I know there are some other projects that use RTC, but its usually only for stable or release branches isn't it? Things seem to be going well these days, what are the issues with trying CTR now for a while? So I've thought about this a lot since Paul's brief objection. Historically I have been a huge non-fan of RTC. It can slow things down significantly with the overhead of switching between patchsets in various stages of review. BUT. Git-svn makes that go away almost entirely. I am never blocking for code to be reviewed; I just go code something else in the meantime. I branch per-ticket so revisiting to incorporate feedback or commit is trivial. I don't feel like I am wasting time fighting the tools like I used to with svn. (Especially with http://github.com/eevans/git-jira-attacher/.) All the other committers have switched to git-svn as well. I do think there should be room for individual discretion here. If you have a trivial change, just commit it and be done. But in general, I think the extra care of RTC is usually worth it for us. I see reviews becoming a lot more perfunctory / not happening at all if we just commit first. (Just about all my experience has been in CTR projects, both closed and OSS. This isn't just a theoretical concern, DESPITE the best of intentions that we'll do reviews, promise.) So I would argue that RTC is working for us, making sure reviews actually happen, while git makes it mostly stay out of our way. I _would_ be in favor of being less dogmatic about it (https://issues.apache.org/jira/browse/CASSANDRA-528 from earlier today is a fine example) but in general I prefer not fixing what ain't broke. -Jonathan
Re: ApacheCon2009
I'll be attending the conference as well as the barcamp/hackathon days before. I'm up for a meet with a bit of coding if someone else organizes it :) /Johan Eric Evans wrote: ApacheCon is next month (November 2-6), I'll be there, how many others are planning to attend? Is there any interest in organizing a meetup/bug squashing party/hack-a-thon that week?
Re: Time to release 0.3
Is this a release vote for the rc2? +1 based on running unit tests and running vpork against a three node cluster. /Johan Jonathan Ellis wrote: CHANGES.txt has been added. RC2 is at http://people.apache.org/~jbellis/cassandra/cassandra-0.3.0-rc2.tar.gz -Jonathan On Sun, Jun 7, 2009 at 9:08 AM, Johan Oskarssonjo...@oskarsson.nu wrote: +1 for changelog, then I guess it's time to roll another release candidate, announce it on the dev list and let people vote on it? I guess with the incubator there's extra steps after that, but I assume the mentors will let us know. /Johan Chris Goffinet wrote: ChangeLog? :) On Jun 5, 2009, at 1:00 PM, Jonathan Ellis wrote: The consensus was that it's better to release an imperfect-but-stable 0.3 now. We've resolved all the 0.3 issues in jira, added a BUGS.txt, and amended our NOTICES to include those of our dependencies. What's next? -Jonathan
Re: 0.3 and the OOM gremlin (CASSANDRA-208)
+1 for getting an Apache release out there as soon as possible to show that the project is alive. If we can resolve the following in some way I think it's ok to push this issue to 0.4.0: * We should make sure that end users are aware of this bug, in a known issues file or the readme for example, with a link to the jira ticket and a description of how it happens and how to avoid it. * Write up how each version is compatible with each other, as mentioned on IRC the 0.3.0 and 0.4.0 data files would not be compatible. * Work out roughly how common this problem will be, if all new users will hit it the release won't really be of much use. * Since the data files will be incompatible between versions, do we plan on bundling an upgrade tool? If not now, when? After 1.0? /Johan Jonathan Ellis wrote: So, in light of Sandeep's point, I think I would prefer to do 0.3 now, and try to do a short 0.4 cycle with current trunk and - the sstable redesign to address OOM problem - multitable support - patch to reduce logging impact so we look better in benchmarks :) - fsync fix - r/m old get_slice and rename get_slice_from to get_slice How does that sound? -Jonathan On Wed, Jun 3, 2009 at 4:59 PM, Jonathan Ellis jbel...@gmail.com wrote: You are right. Of course, there's no sense in making such a tool harder to write than it needs to be. But I don't care that strongly since I won't be writing it. :P -Jonathan On Wed, Jun 3, 2009 at 4:53 PM, Sandeep Tata sandeep.t...@gmail.com wrote: Won't things like multi-table support break binary compatibility anyway? We might be stuck with having to write a tool that migrates from a 0.3 format to a 0.4 format. On Wed, Jun 3, 2009 at 2:44 PM, Jonathan Ellis jbel...@gmail.com wrote: The fix for 208 [1] is fairly invasive. should we (a) release another RC and do more testing before 0.3 final, or (b) release 0.3 without these changes, and only add this fix to trunk? Although I see the 0.3 release primarily as a means to let people start playing with the cassandra data model, I don't know that I want to release it knowing that 0.4 is going to be binary-incompatible with the 0.3 data files. So I'd be inclined towards (a). [1] https://issues.apache.org/jira/browse/CASSANDRA-208 -Jonathan
NOTICE file reqs?
I'm poking around in the incubator guide to understand the requirements for a first release. I'm having a hard time figuring out exactly what is needed when it comes to NOTICE files, we have one that is copied from the Hadoop project: http://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.3/NOTICE.txt In this ticket I have suggested adding a .LICENSE file for each of the jar files we depend on, to make it clear what license they are under. Is that sufficient? https://issues.apache.org/jira/browse/CASSANDRA-176 /Johan
First release requirements
What exactly is needed before we can release the first version of Cassandra, once the issues assigned to 0.3.0 in Jira are resolved? I have gone through this document and tried to fix as many of the bits that jumped at me: http://incubator.apache.org/guides/releasemanagement.html We are in the processes of fixing or have fixed artifact naming, license headers, source dist and dependency licensing. * Do we need someone from apache to check the legal bits? * See other email about NOTICE files * Other concerns from the mentors? /Johan
Re: Versioning scheme
I guess this time it's my OCD that thinks having a 0.3 and then a 0.3.1 feels wrong, something missing on the first one :) /Johan Jonathan Ellis wrote: There's nothing in 0.3 that implies there won't be a 0.3.1. On Thu, May 14, 2009 at 12:48 PM, Johan Oskarsson jo...@oskarsson.nu wrote: The current versions in jira are 0.3 and 0.4, should we not explicitly mention the point release? For example 0.3.0, to make it consistent when we release bug fixes in 0.3.1 Thoughts? /Johan
Development process (was: working together)
Thanks Sandeep. Would we all be comfortable adopting this process going forward, hopefully reducing friction, bugs and problems in general? I assume +1 from me and Sandeep so far. /Johan Sandeep Tata wrote: Johan, the wiki pages are great! I think they will help iron out our process for contributing and committing. (I added a pointer to the formatting conventions in HowToContribute , can't think of anything else to add) http://cwiki.apache.org/confluence/display/CSDR/HowToContribute http://cwiki.apache.org/confluence/display/CSDR/HowToCommit http://cwiki.apache.org/confluence/display/CSDR/HowToRelease A short summary and description of why these points make sense: * Patch-only evolution of code, attached to a jira issue * At least one +1 on each issue before it can be committed, -1 stops the patch. Those two points would make sure that if someone disagrees with a change, a refactoring etc, they have a chance to voice their opinion and steer it into the right direction. * Trunk is not considered stable, but must pass unit tests * Any non trivial change should include unit tests * When a branch is created to prepare for a release extra effort is put into QA to make sure the release is as stable as possible. Point releases would then go out to fix issues found after the release was done. * Once a release has been out for a while and people are using it in production without problems it is upgraded to stable status. The purpose of these points is to encourage a vibrant codebase, to not be afraid of for example refactoring if it improves the code readability or testability. I appreciate that Cassandra is a complex system and that changes might have unwanted side effects, but hopefully adding tests and code reviews will reduce those. As a final catch-all the release candidate and stable release process should help end users avoid bugs. Thoughts on the wiki pages? Do they help resolve some of the problems? /Johan Sandeep Tata wrote: Thoughts inline: So the problems I am seeing are: 1. We elected a committer without real community consensus. The barrier of entry was unnatural low on this one. On the other hand we need non-FB committers for the graduation. The more the better. (No reason for low entry barrier though!) I think everyone (including the FB guys) agree that Jonathan has been working hard to help move the codebase forward. He has been quick to revert changes that broke the code that the FB guys had in the pipeline and have committed since. I think much of the friction comes from not having a process, which takes us to Torsten's #2: 2. A missing definition of development process: - What is considered a valid code review? - How much are changes discussed up-front? - What is the roadmap? ...for whom? (weighted as a community) This is probably where we need most work. Here are some simple suggestions: a) I'm a fan of a patch-only evolution of code. All changes come from patches, and no changes come from anywhere else (eg. the committers IDE). Even if it is something as simple as cleaning up comments or changing a variable name. b) A patch gets applied if at least one reviewer +1s it, and no one -1s it. c) A patch should pass all unit tests. Any significant patch should come with additional unit tests. Some of this, of course, will mean more work for the committers. Sure, but such processes are essential if the project is to grow beyond a small group of core contributors. 3. Is trunk considered stable? Or aren't we missing a stable branch for the required stability? Once we have the separation between stable and trunk: Will patches really find it's way from trunk into stable? Is Facebook OK with that approach. Will everyone cope with the additional work of merging? Would it be useful ...or overkill to use merge tracking? I agree with Matt. Trunk should pass build + tests, but should not be trusted for production. I think 0.2 was supposed to be a stable branch. Avinash, Prashant -- what are your thoughts on this? Are you guys comfortable with this approach? Do you foresee any problems? Basically, use a release branch for production. The release branches only admit stability patches. New feature and cleanup patches go to trunk. Folks running Cassandra in production only need to be nervous when moving from one release to next, and not worry too much about every single patch breaking their running system. 4. Real world testing feedback is not publicly available. So the feedback on changes will only slowly reach the community. This is not easy for a project like this. But is there a faster way to provide testing feedback? (IIRC Yahoo was providing testing feedback for Hadoop. They even try to auto-apply patches from JIRA) With time, FB may be able to provide feedback from their divert some traffic to the new version system. Auto-applying patches from JIRA sounds a little ambitious right now :-) 5. Is there
Re: working together
+1 for Sandeeps development process suggestions. In order to address some of the issues brought forward in this thread I have adapted the following wiki pages from other projects and from various emails. They could serve as the basis for an initial process. http://cwiki.apache.org/confluence/display/CSDR/HowToContribute http://cwiki.apache.org/confluence/display/CSDR/HowToCommit http://cwiki.apache.org/confluence/display/CSDR/HowToRelease A short summary and description of why these points make sense: * Patch-only evolution of code, attached to a jira issue * At least one +1 on each issue before it can be committed, -1 stops the patch. Those two points would make sure that if someone disagrees with a change, a refactoring etc, they have a chance to voice their opinion and steer it into the right direction. * Trunk is not considered stable, but must pass unit tests * Any non trivial change should include unit tests * When a branch is created to prepare for a release extra effort is put into QA to make sure the release is as stable as possible. Point releases would then go out to fix issues found after the release was done. * Once a release has been out for a while and people are using it in production without problems it is upgraded to stable status. The purpose of these points is to encourage a vibrant codebase, to not be afraid of for example refactoring if it improves the code readability or testability. I appreciate that Cassandra is a complex system and that changes might have unwanted side effects, but hopefully adding tests and code reviews will reduce those. As a final catch-all the release candidate and stable release process should help end users avoid bugs. Thoughts on the wiki pages? Do they help resolve some of the problems? /Johan Sandeep Tata wrote: Thoughts inline: So the problems I am seeing are: 1. We elected a committer without real community consensus. The barrier of entry was unnatural low on this one. On the other hand we need non-FB committers for the graduation. The more the better. (No reason for low entry barrier though!) I think everyone (including the FB guys) agree that Jonathan has been working hard to help move the codebase forward. He has been quick to revert changes that broke the code that the FB guys had in the pipeline and have committed since. I think much of the friction comes from not having a process, which takes us to Torsten's #2: 2. A missing definition of development process: - What is considered a valid code review? - How much are changes discussed up-front? - What is the roadmap? ...for whom? (weighted as a community) This is probably where we need most work. Here are some simple suggestions: a) I'm a fan of a patch-only evolution of code. All changes come from patches, and no changes come from anywhere else (eg. the committers IDE). Even if it is something as simple as cleaning up comments or changing a variable name. b) A patch gets applied if at least one reviewer +1s it, and no one -1s it. c) A patch should pass all unit tests. Any significant patch should come with additional unit tests. Some of this, of course, will mean more work for the committers. Sure, but such processes are essential if the project is to grow beyond a small group of core contributors. 3. Is trunk considered stable? Or aren't we missing a stable branch for the required stability? Once we have the separation between stable and trunk: Will patches really find it's way from trunk into stable? Is Facebook OK with that approach. Will everyone cope with the additional work of merging? Would it be useful ...or overkill to use merge tracking? I agree with Matt. Trunk should pass build + tests, but should not be trusted for production. I think 0.2 was supposed to be a stable branch. Avinash, Prashant -- what are your thoughts on this? Are you guys comfortable with this approach? Do you foresee any problems? Basically, use a release branch for production. The release branches only admit stability patches. New feature and cleanup patches go to trunk. Folks running Cassandra in production only need to be nervous when moving from one release to next, and not worry too much about every single patch breaking their running system. 4. Real world testing feedback is not publicly available. So the feedback on changes will only slowly reach the community. This is not easy for a project like this. But is there a faster way to provide testing feedback? (IIRC Yahoo was providing testing feedback for Hadoop. They even try to auto-apply patches from JIRA) With time, FB may be able to provide feedback from their divert some traffic to the new version system. Auto-applying patches from JIRA sounds a little ambitious right now :-) 5. Is there really no code ownership issue. Working on a code base for 1-2 years can get you attached to the code you have written. Can everyone really let go? Is it OK if someone else really
Re: Website [WAS Re: Wiki]
We now have a first version of the site running: http://incubator.apache.org/cassandra However, Matthieu commented on the wiki ticket here: https://issues.apache.org/jira/browse/CASSANDRA-15 that he wants a decision on the site's future before moving on. The two options as far as I know: 1. Store the site and any source material in svn, publish to the apache.org site. It could be raw html, forrest xml+generated html, a script that generates html or something similar. 2. Use a wiki as the site. Confluence? Personally I prefer option 1, so we can accept website patches from anyone and leave the wiki as a publicly editable place for everyone to share information. I don't care if we use Forrest or not for the site, it was just an easy way to get started and a lot of Apache projects use it already. If option 1 is chosen we can discuss what tool to use later. What does everyone think? Should we initiate a vote about it? /Johan Sandeep Tata wrote: I think the Forrest site is great to start with. If you check it into the repository, others will be able to contribute patches much like code and the burden of building up the website will not fall on just the committers. A publicly editable wiki might work too, but if only the committers have edit permissions -- much of the work for building the site falls on them. We want to quickly get to a point where the committers can review patches and guide the community in adding value to the code. Not be bogged down in editing websites :) On Wed, Mar 18, 2009 at 10:56 AM, Matthieu Riou matthieu.r...@gmail.com wrote: On Wed, Mar 18, 2009 at 10:30 AM, Avinash Lakshman avinash.laksh...@gmail.com wrote: I guess. Isn't that easier? Is there something else that is the norm? No norm, just several different options. I personally like when the source used to generate the site can be checked in the repository but that's mostly a matter of taste. This for example is generated with Forrest: http://ant.apache.org/ http://lucene.apache.org/ This is generated by a set of Ruby scripts from Textile files: http://buildr.apache.org/ And this is generated from Confluence using custom templates: http://geronimo.apache.org/ http://ode.apache.org/ FWIW, a Forrest site has already been contributed so that could be used to start with until a sexier option is implemented? Matthieu Avinash On Wed, Mar 18, 2009 at 10:28 AM, Matthieu Riou matthieu.r...@gmail.comwrote: On Wed, Mar 18, 2009 at 10:25 AM, Avinash Lakshman avinash.laksh...@gmail.com wrote: I just got myself added as an Author maybe couple of days ago. I will start working on it. But there are other things too on my plate. I will get around to it soon So does that mean that you plan to use the wiki as a website? Matthieu Avinash On Wed, Mar 18, 2009 at 9:21 AM, Sandeep Tata sandeep.t...@gmail.com wrote: Prashant, Avinash, Jiansheng... Any update on the website? It has been several weeks since the project got into the incubator, but we still don't seem to have a website. I think using Johan's Forrest generate site to start with is a good idea. Are you guys considering anything else? What's holding us up? Do you need any help with this? Sandeep On Wed, Mar 11, 2009 at 11:12 AM, Matthieu Riou matthieu.r...@gmail.com wrote: On Wed, Mar 11, 2009 at 11:08 AM, Johan Oskarsson jo...@oskarsson.nu wrote: Ok, I understand. On the topic of website I have suggested and created a basic Forrest generated site here: https://issues.apache.org/jira/browse/CASSANDRA-2 Nice! No response from any of the committers on the jira or on the lists yet though. I believe they weren't subscribed yet :) So what do other folks think about the website and using Forrest at least to get started? Cheers, Matthieu /Johan Matthieu Riou wrote: On Wed, Mar 11, 2009 at 10:59 AM, Johan Oskarsson jo...@oskarsson.nu wrote: Out of curiosity is there a reason to use a wiki without public edit access? Apache Hadoop, for example, have had one that is editable by anyone and they have not had any problems afaik. On the contrary I believe a lot of useful updates wouldn't happen with a wiki restricted to committers only. I've started this way just because I don't know how the wiki is going to be used by the project yet. Some projects use the Confluence wiki as their website and even bundle it as part of their distribution. In that context, only committers should be able to contribute. I don't think it's the case for Hadoop, they have a separate Forrest-generated website. So depending on how the wiki is going to be used and how the Apache Casssandra website will be built, we can decide to have it work either way. Cheers, Matthieu /Johan Matthieu Riou wrote: Hi guys, I've just created a new Confluence space for Cassandra, Avinash's planning to use it and most projects find it handy anyway. If other committers want edit access to it, please send me your