Re: Welcome committer Johan Oskarsson

2009-11-21 Thread Johan Oskarsson
Thanks everyone. I'm really glad to see the project moving along at a 
healthy pace and that people are starting to use/test it in production 
environments.


Keep up the good work!

/Johan

Jonathan Ellis wrote:

The Cassandra PPMC has voted to add Johan Oskarsson as a committer to
the Cassandra incubator project.  Welcome, Johan -- or more correctly,
thanks for your hard work! :)

-Jonathan




Re: [VOTE] Website

2009-11-11 Thread Johan Oskarsson
+1. A great step forward from the current version and a good base to 
improve upon.


/Johan

Eric Evans wrote:

The current website is quite ugly, and I don't know about you, but I'm
itching to put the new project logo to use, so I'd like to propose
publishing http://cassandra.deadcafe.org (to
http://incubator.apache.org/cassandra).

This is a slightly tweaked version of Daniel Lundin's work from
CASSANDRA-231[1] (thanks Daniel!), and the content is nearly identical
to what is on the current site.

I do not consider this to be the final word on the matter, I think there
is still much to be done. For example, the logo w/text is something I
cobbled together in Gimp and should be considered a placeholder. Still,
it's much better than what we currently have and we can incrementally
improve it as we go forward.


[1] https://issues.apache.org/jira/browse/CASSANDRA-231





Re: Graduation?

2009-11-05 Thread Johan Oskarsson

+1 on RTC for the reasons mentioned below.

/Johan

Jonathan Ellis wrote:

On Thu, Nov 5, 2009 at 3:29 PM, ant elder ant.el...@gmail.com wrote:

I think it could be tough to get Cassandra through a graduation vote
on general@ while working with RTC. I know there are some other
projects that use RTC, but its usually only for stable or release
branches isn't it?

Things seem to be going well these days, what are the issues with
trying CTR now for a while?


So I've thought about this a lot since Paul's brief objection.

Historically I have been a huge non-fan of RTC.  It can slow things
down significantly with the overhead of switching between patchsets in
various stages of review.

BUT.

Git-svn makes that go away almost entirely.  I am never blocking for
code to be reviewed; I just go code something else in the meantime.  I
branch per-ticket so revisiting to incorporate feedback or commit is
trivial.  I don't feel like I am wasting time fighting the tools like
I used to with svn.  (Especially with
http://github.com/eevans/git-jira-attacher/.)  All the other
committers have switched to git-svn as well.

I do think there should be room for individual discretion here.  If
you have a trivial change, just commit it and be done.  But in
general, I think the extra care of RTC is usually worth it for us.  I
see reviews becoming a lot more perfunctory / not happening at all if
we just commit first.  (Just about all my experience has been in CTR
projects, both closed and OSS.  This isn't just a theoretical concern,
DESPITE the best of intentions that we'll do reviews, promise.)

So I would argue that RTC is working for us, making sure reviews
actually happen, while git makes it mostly stay out of our way.  I
_would_ be in favor of being less dogmatic about it
(https://issues.apache.org/jira/browse/CASSANDRA-528 from earlier
today is a fine example) but in general I prefer not fixing what ain't
broke.

-Jonathan




Re: ApacheCon2009

2009-10-08 Thread Johan Oskarsson
I'll be attending the conference as well as the barcamp/hackathon days 
before. I'm up for a meet with a bit of coding if someone else organizes 
it :)


/Johan

Eric Evans wrote:

ApacheCon is next month (November 2-6), I'll be there, how many others
are planning to attend? Is there any interest in organizing a meetup/bug
squashing party/hack-a-thon that week?





Re: Time to release 0.3

2009-06-17 Thread Johan Oskarsson
Is this a release vote for the rc2?

+1 based on running unit tests and running vpork against a three node
cluster.

/Johan

Jonathan Ellis wrote:
 CHANGES.txt has been added.
 
 RC2 is at 
 http://people.apache.org/~jbellis/cassandra/cassandra-0.3.0-rc2.tar.gz
 
 -Jonathan
 
 On Sun, Jun 7, 2009 at 9:08 AM, Johan Oskarssonjo...@oskarsson.nu wrote:
 +1 for changelog, then I guess it's time to roll another release candidate,
 announce it on the dev list and let people vote on it?

 I guess with the incubator there's extra steps after that, but I assume the
 mentors will let us know.

 /Johan

 Chris Goffinet wrote:
 ChangeLog? :)

 On Jun 5, 2009, at 1:00 PM, Jonathan Ellis wrote:

 The consensus was that it's better to release an imperfect-but-stable 0.3
 now.

 We've resolved all the 0.3 issues in jira, added a BUGS.txt, and
 amended our NOTICES to include those of our dependencies.

 What's next?

 -Jonathan




Re: 0.3 and the OOM gremlin (CASSANDRA-208)

2009-06-04 Thread Johan Oskarsson
+1 for getting an Apache release out there as soon as possible to show
that the project is alive.

If we can resolve the following in some way I think it's ok to push this
issue to 0.4.0:

* We should make sure that end users are aware of this bug, in a known
issues file or the readme for example, with a link to the jira ticket
and a description of how it happens and how to avoid it.
* Write up how each version is compatible with each other, as mentioned
on IRC the 0.3.0 and 0.4.0 data files would not be compatible.
* Work out roughly how common this problem will be, if all new users
will hit it the release won't really be of much use.
* Since the data files will be incompatible between versions, do we plan
on bundling an upgrade tool? If not now, when? After 1.0?

/Johan

Jonathan Ellis wrote:
 So, in light of Sandeep's point, I think I would prefer to do 0.3 now,
 and try to do a short 0.4 cycle with current trunk and
 
  - the sstable redesign to address OOM problem
  - multitable support
  - patch to reduce logging impact so we look better in benchmarks :)
  - fsync fix
  - r/m old get_slice and rename get_slice_from to get_slice
 
 How does that sound?
 
 -Jonathan
 
 On Wed, Jun 3, 2009 at 4:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
 You are right.  Of course, there's no sense in making such a tool
 harder to write than it needs to be.

 But I don't care that strongly since I won't be writing it. :P

 -Jonathan

 On Wed, Jun 3, 2009 at 4:53 PM, Sandeep Tata sandeep.t...@gmail.com wrote:
 Won't things like multi-table support break binary compatibility anyway?

 We might be stuck with having to write a tool that migrates from a 0.3
 format to a 0.4 format.


 On Wed, Jun 3, 2009 at 2:44 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The fix for 208 [1] is fairly invasive.  should we

 (a) release another RC and do more testing before 0.3 final, or
 (b) release 0.3 without these changes, and only add this fix to trunk?

 Although I see the 0.3 release primarily as a means to let people
 start playing with the cassandra data model, I don't know that I want
 to release it knowing that 0.4 is going to be binary-incompatible with
 the 0.3 data files.  So I'd be inclined towards (a).

 [1] https://issues.apache.org/jira/browse/CASSANDRA-208

 -Jonathan




NOTICE file reqs?

2009-05-28 Thread Johan Oskarsson
I'm poking around in the incubator guide to understand the requirements
for a first release. I'm having a hard time figuring out exactly what is
needed when it comes to NOTICE files, we have one that is copied from
the Hadoop project:
http://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.3/NOTICE.txt


In this ticket I have suggested adding a .LICENSE file for each of the
jar files we depend on, to make it clear what license they are under. Is
that sufficient? https://issues.apache.org/jira/browse/CASSANDRA-176

/Johan


First release requirements

2009-05-28 Thread Johan Oskarsson
What exactly is needed before we can release the first version of
Cassandra, once the issues assigned to 0.3.0 in Jira are resolved?

I have gone through this document and tried to fix as many of the bits
that jumped at me: http://incubator.apache.org/guides/releasemanagement.html

We are in the processes of fixing or have fixed artifact naming, license
headers, source dist and dependency licensing.

* Do we need someone from apache to check the legal bits?
* See other email about NOTICE files
* Other concerns from the mentors?

/Johan


Re: Versioning scheme

2009-05-14 Thread Johan Oskarsson
I guess this time it's my OCD that thinks having a 0.3 and then a 0.3.1
feels wrong, something missing on the first one :)

/Johan

Jonathan Ellis wrote:
 There's nothing in 0.3 that implies there won't be a 0.3.1.
 
 On Thu, May 14, 2009 at 12:48 PM, Johan Oskarsson jo...@oskarsson.nu wrote:
 The current versions in jira are 0.3 and 0.4, should we not explicitly
 mention the point release?

 For example 0.3.0, to make it consistent when we release bug fixes in 0.3.1

 Thoughts?

 /Johan




Development process (was: working together)

2009-04-09 Thread Johan Oskarsson
Thanks Sandeep.

Would we all be comfortable adopting this process going forward,
hopefully reducing friction, bugs and problems in general?

I assume +1 from me and Sandeep so far.

/Johan

Sandeep Tata wrote:
 Johan, the wiki pages are great! I think they will help iron out our
 process for contributing and committing.
 
 (I added a pointer to the formatting conventions in HowToContribute ,
 can't think of anything else to add)
 
 http://cwiki.apache.org/confluence/display/CSDR/HowToContribute
 http://cwiki.apache.org/confluence/display/CSDR/HowToCommit
 http://cwiki.apache.org/confluence/display/CSDR/HowToRelease

 A short summary and description of why these points make sense:
 * Patch-only evolution of code, attached to a jira issue
 * At least one +1 on each issue before it can be committed, -1 stops the
 patch.

 Those two points would make sure that if someone disagrees with a
 change, a refactoring etc, they have a chance to voice their opinion and
 steer it into the right direction.


 * Trunk is not considered stable, but must pass unit tests
 * Any non trivial change should include unit tests
 * When a branch is created to prepare for a release extra effort is put
 into QA to make sure the release is as stable as possible. Point
 releases would then go out to fix issues found after the release was done.
 * Once a release has been out for a while and people are using it in
 production without problems it is upgraded to stable status.

 The purpose of these points is to encourage a vibrant codebase, to not
 be afraid of for example refactoring if it improves the code readability
 or testability. I appreciate that Cassandra is a complex system and that
 changes might have unwanted side effects, but hopefully adding tests and
 code reviews will reduce those. As a final catch-all the release
 candidate and stable release process should help end users avoid bugs.


 Thoughts on the wiki pages? Do they help resolve some of the problems?

 /Johan

 Sandeep Tata wrote:
 Thoughts inline:

 So the problems I am seeing are:

 1. We elected a committer without real community consensus. The
 barrier of entry was unnatural low on this one. On the other hand we
 need non-FB committers for the graduation. The more the better. (No
 reason for low entry barrier though!)
 I think everyone (including the FB guys) agree that Jonathan has been
 working hard to help move the codebase forward. He has been quick to
 revert changes that broke the code that the FB guys had in the
 pipeline and have committed since. I think much of the friction comes
 from not having a process, which takes us to Torsten's #2:

 2. A missing definition of development process:
  - What is considered a valid code review?
  - How much are changes discussed up-front?
  - What is the roadmap? ...for whom? (weighted as a community)
 This is probably where we need most work. Here are some simple suggestions:

 a) I'm a fan of a patch-only evolution of code. All changes come
 from patches, and no changes come from anywhere else (eg. the
 committers IDE). Even if it is something as simple as cleaning up
 comments or changing a variable name.
 b) A patch gets applied if at least one reviewer +1s it, and no one -1s it.
 c) A patch should pass all unit tests. Any significant patch should
 come with additional unit tests.

 Some of this, of course, will mean more work for the committers.
 Sure, but such processes are essential if the project is to grow
 beyond a small group of core contributors.

 3. Is trunk considered stable? Or aren't we missing a stable branch
 for the required stability? Once we have the separation between stable
 and trunk: Will patches really find it's way from trunk into stable?
 Is Facebook OK with that approach. Will everyone cope with the
 additional work of merging? Would it be useful ...or overkill to use
 merge tracking?
 I agree with Matt. Trunk should pass build + tests, but should not be
 trusted for production. I think 0.2 was supposed to be a stable
 branch. Avinash, Prashant -- what are your thoughts on this? Are you
 guys comfortable with this approach? Do you foresee any problems?

 Basically, use a release branch for production. The release branches
 only admit stability patches. New feature and cleanup patches go to
 trunk. Folks running Cassandra in production only need to be nervous
 when moving from one release to next, and not worry too much about
 every single patch breaking their running system.

 4. Real world testing feedback is not publicly available. So the
 feedback on changes will only slowly reach the community. This is not
 easy for a project like this. But is there a faster way to provide
 testing feedback? (IIRC Yahoo was providing testing feedback for
 Hadoop. They even try to auto-apply patches from JIRA)
 With time, FB may be able to provide feedback from their divert some
 traffic to the new version system. Auto-applying patches from JIRA
 sounds a little ambitious right now :-)

 5. Is there 

Re: working together

2009-04-08 Thread Johan Oskarsson
+1 for Sandeeps development process suggestions.

In order to address some of the issues brought forward in this thread I
have adapted the following wiki pages from other projects and from
various emails. They could serve as the basis for an initial process.

http://cwiki.apache.org/confluence/display/CSDR/HowToContribute
http://cwiki.apache.org/confluence/display/CSDR/HowToCommit
http://cwiki.apache.org/confluence/display/CSDR/HowToRelease

A short summary and description of why these points make sense:
* Patch-only evolution of code, attached to a jira issue
* At least one +1 on each issue before it can be committed, -1 stops the
patch.

Those two points would make sure that if someone disagrees with a
change, a refactoring etc, they have a chance to voice their opinion and
steer it into the right direction.


* Trunk is not considered stable, but must pass unit tests
* Any non trivial change should include unit tests
* When a branch is created to prepare for a release extra effort is put
into QA to make sure the release is as stable as possible. Point
releases would then go out to fix issues found after the release was done.
* Once a release has been out for a while and people are using it in
production without problems it is upgraded to stable status.

The purpose of these points is to encourage a vibrant codebase, to not
be afraid of for example refactoring if it improves the code readability
or testability. I appreciate that Cassandra is a complex system and that
changes might have unwanted side effects, but hopefully adding tests and
code reviews will reduce those. As a final catch-all the release
candidate and stable release process should help end users avoid bugs.


Thoughts on the wiki pages? Do they help resolve some of the problems?

/Johan

Sandeep Tata wrote:
 Thoughts inline:
 
 So the problems I am seeing are:

 1. We elected a committer without real community consensus. The
 barrier of entry was unnatural low on this one. On the other hand we
 need non-FB committers for the graduation. The more the better. (No
 reason for low entry barrier though!)
 
 I think everyone (including the FB guys) agree that Jonathan has been
 working hard to help move the codebase forward. He has been quick to
 revert changes that broke the code that the FB guys had in the
 pipeline and have committed since. I think much of the friction comes
 from not having a process, which takes us to Torsten's #2:
 
 2. A missing definition of development process:
  - What is considered a valid code review?
  - How much are changes discussed up-front?
  - What is the roadmap? ...for whom? (weighted as a community)
 
 This is probably where we need most work. Here are some simple suggestions:
 
 a) I'm a fan of a patch-only evolution of code. All changes come
 from patches, and no changes come from anywhere else (eg. the
 committers IDE). Even if it is something as simple as cleaning up
 comments or changing a variable name.
 b) A patch gets applied if at least one reviewer +1s it, and no one -1s it.
 c) A patch should pass all unit tests. Any significant patch should
 come with additional unit tests.
 
 Some of this, of course, will mean more work for the committers.
 Sure, but such processes are essential if the project is to grow
 beyond a small group of core contributors.
 
 3. Is trunk considered stable? Or aren't we missing a stable branch
 for the required stability? Once we have the separation between stable
 and trunk: Will patches really find it's way from trunk into stable?
 Is Facebook OK with that approach. Will everyone cope with the
 additional work of merging? Would it be useful ...or overkill to use
 merge tracking?
 
 I agree with Matt. Trunk should pass build + tests, but should not be
 trusted for production. I think 0.2 was supposed to be a stable
 branch. Avinash, Prashant -- what are your thoughts on this? Are you
 guys comfortable with this approach? Do you foresee any problems?
 
 Basically, use a release branch for production. The release branches
 only admit stability patches. New feature and cleanup patches go to
 trunk. Folks running Cassandra in production only need to be nervous
 when moving from one release to next, and not worry too much about
 every single patch breaking their running system.
 
 4. Real world testing feedback is not publicly available. So the
 feedback on changes will only slowly reach the community. This is not
 easy for a project like this. But is there a faster way to provide
 testing feedback? (IIRC Yahoo was providing testing feedback for
 Hadoop. They even try to auto-apply patches from JIRA)
 
 With time, FB may be able to provide feedback from their divert some
 traffic to the new version system. Auto-applying patches from JIRA
 sounds a little ambitious right now :-)
 
 5. Is there really no code ownership issue. Working on a code base for
 1-2 years can get you attached to the code you have written. Can
 everyone really let go? Is it OK if someone else really 

Re: Website [WAS Re: Wiki]

2009-03-29 Thread Johan Oskarsson
We now have a first version of the site running: 
http://incubator.apache.org/cassandra


However, Matthieu commented on the wiki ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-15
that he wants a decision on the site's future before moving on.

The two options as far as I know:

1. Store the site and any source material in svn, publish to the 
apache.org site. It could be raw html, forrest xml+generated html, a 
script that generates html or something similar.


2. Use a wiki as the site. Confluence?


Personally I prefer option 1, so we can accept website patches from 
anyone and leave the wiki as a publicly editable place for everyone to 
share information.


I don't care if we use Forrest or not for the site, it was just an easy 
way to get started and a lot of Apache projects use it already. If 
option 1 is chosen we can discuss what tool to use later.


What does everyone think? Should we initiate a vote about it?

/Johan

Sandeep Tata wrote:

I think the Forrest site is great to start with. If you check it into
the repository, others will be able to contribute patches much like
code and the burden of building up the website will not fall on just
the committers.

A publicly editable wiki might work too, but if only the committers
have edit permissions -- much of the work for building the site falls
on them. We want to quickly get to a point where the committers can
review patches and guide the community in adding value to the code.
Not be bogged down in editing websites :)



On Wed, Mar 18, 2009 at 10:56 AM, Matthieu Riou matthieu.r...@gmail.com wrote:

On Wed, Mar 18, 2009 at 10:30 AM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:


I guess. Isn't that easier? Is there something else that is the norm?


No norm, just several different options. I personally like when the source
used to generate the site can be checked in the repository but that's mostly
a matter of taste. This for example is generated with Forrest:

http://ant.apache.org/
http://lucene.apache.org/

This is generated by a set of Ruby scripts from Textile files:

http://buildr.apache.org/

And this is generated from Confluence using custom templates:

http://geronimo.apache.org/
http://ode.apache.org/

FWIW, a Forrest site has already been contributed so that could be used to
start with until a sexier option is implemented?

Matthieu





Avinash


On Wed, Mar 18, 2009 at 10:28 AM, Matthieu Riou matthieu.r...@gmail.comwrote:


On Wed, Mar 18, 2009 at 10:25 AM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:


I just got myself added as an Author maybe couple of days ago. I will
start working on it. But there are other things too on my plate. I will

get

around to it soon


So does that mean that you plan to use the wiki as a website?

Matthieu



Avinash

On Wed, Mar 18, 2009 at 9:21 AM, Sandeep Tata sandeep.t...@gmail.com

wrote:
Prashant, Avinash, Jiansheng...

Any update on the website?
It has been several weeks since the project got into the incubator,
but we still don't seem to have a website.

I think using Johan's Forrest generate site to start with is a good
idea. Are you guys considering anything else? What's holding us up? Do
you need any help with this?

Sandeep

On Wed, Mar 11, 2009 at 11:12 AM, Matthieu Riou 

matthieu.r...@gmail.com

wrote:

On Wed, Mar 11, 2009 at 11:08 AM, Johan Oskarsson 

jo...@oskarsson.nu

wrote:


Ok, I understand.

On the topic of website I have suggested and created a basic

Forrest

generated site here:
https://issues.apache.org/jira/browse/CASSANDRA-2


Nice!



No response from any of the committers on the jira or on the lists

yet

though.


I believe they weren't subscribed yet :) So what do other folks

think

about

the website and using Forrest at least to get started?

Cheers,
Matthieu



/Johan

Matthieu Riou wrote:

On Wed, Mar 11, 2009 at 10:59 AM, Johan Oskarsson 

jo...@oskarsson.nu

wrote:


Out of curiosity is there a reason to use a wiki without public

edit

access? Apache Hadoop, for example, have had one that is

editable

by

anyone and they have not had any problems afaik. On the contrary

I

believe a lot of useful updates wouldn't happen with a wiki

restricted

to committers only.


I've started this way just because I don't know how the wiki is

going

to

be

used by the project yet.

Some projects use the Confluence wiki as their website and even

bundle

it

as

part of their distribution. In that context, only committers

should

be

able

to contribute. I don't think it's the case for Hadoop, they have

a

separate

Forrest-generated website.

So depending on how the wiki is going to be used and how the

Apache

Casssandra website will be built, we can decide to have it work

either

way.

Cheers,
Matthieu



/Johan

Matthieu Riou wrote:

Hi guys,

I've just created a new Confluence space for Cassandra,

Avinash's

planning

to use it and most projects find it handy anyway. If other

committers

want

edit access to it, please send me your