On Aug 20, 2008, at 10:10 AM, Karl Wettin wrote:

I think it would be nice to get it out ASAP, perhaps even by next weekend? I'll get started on the HowToRelease wiki page right now.

Anything is possible, I suppose. I'll do what I can, but I am also planning a Solr release for next week, so...




I also got a bunch of post 0.1 thoughts:

We could post a wishlist/planning for 0.2 in the release of 0.1. This is probably just a link to a currently non existing Wiki page where we list what people are working on that may or may not become something. This could turn out to be a catalysator, and if nothing else it could be used to help consolidate work taking place outside of the fora to avoid duplicate work. Or is it better if we filled the JIRA with that sort of stuff? It would be nice if we did not end up with a thousand old and open issues without patches. Or?

I'm open to anything, but I've always found coordinating O/S projects to be like the proverbial cat herding problem. I'd love to get Hadoop's Patch checker system in place for Mahout on Hudson, I think this can help w/ the bad patch problem. Of course, the flip side to the thousand old issues is the stale wiki. I don't know a good solution, as they all rely on people to be involved and take up the work to maintain. Or perhaps, we can come up w/ a cool Mahout application that we train on JIRA to classify issues into: Good, maybe, and bad and we automatically close/mark any issue that is labeled as bad. :-) Might make for a cool, real world application that would benefit a whole ton of projects in the ASF alone. Argh, where's that cloning machine when you need it? Just not enough hours in the day.




Also, one way to potenitally get lots of users at release is to introduce a simple bandade between a Lucene index and Mahout. No need to make it as complex as MAHOUT-7, something that converts the term vector of a document to a SparseVector using term identity as column would be enough. They who don't want the term vectors in their index could use some layer that pre-analyzed a Document at index time (and replace the fields with the stream) and passed down the vectors in some format that makes sense for Mahout.

I think the Bayes stuff has some of this ground work, namely the examples use Lucene to analyze the articles and put them in the Bayes format.




I for one is working on MAHOUT-19, using -61 (mbox/nntp->matrix) for examples and trying to come up with a new take on -65 (meta data) (as -61 can make use of that). I'm also looking closer at cross fold validation to power various feature selection schemes, but this is a bit secondary.

Cool. Once we get the release out, I plan on building an Amazon AMI for it and putting up docs on it, as well as start doing some tests, using the new NB/CNB Wikipedia stuff, and maybe also setting up an example using DMOZ or something like that as a POC.

I would also love to get in a SVM implementation for 0.2.





20 aug 2008 kl. 14.59 skrev Grant Ingersoll:

Hi Mahouters,

I'd like to suggest we start gearing up for a 0.1 release. Since this is our first one, we're going to have a bit of extra work to get things in the right shape, so any extra time you have would be most appreciated.

First and foremost, would be testing, etc. on the current trunk (assuming SVN is up, which it doesn't appear to be right now) and providing feedback on what's good and bad. This is especially true of people who have access to clusters (which many of us committers will soon have thanks to a kind donation by Amazon.)

Second, we should go through JIRA and (un)mark issues in JIRA as either in or out of 0.1 or closed. See https://issues.apache.org/jira/browse/MAHOUT/fixforversion/12312976 Of these, MAHOUT-9, 56 and 60 are all pretty much done, they just need a bit more documentation. M-54 looks like it could be closed, right Jeff, as the reporter hasn't responded to questions, etc.? So, if you have something you think should be in 0.1, please go mark it as such in JIRA.

Next, we need to address https://issues.apache.org/jira/browse/MAHOUT-69 , at a minimum. One of us should look at other ASF projects (Lucene/Solr) and grab their "How To Make a Release" documentation (on the wiki) and put it up on our wiki. Volunteers?

After that, I'd suggest we are ready for a release. Typically, we call a "freeze" date, and then we release a series of release candidates. For Mahout, since we are so young and this is such an early release, I don't think we need to obsess too much over this. Our APIs are likely to change in the future, so we should just keep things light: release early, release often. I volunteer to be the release manager.

With the release ready to go, then we can go out and make some noise, to help attract more people, etc. We can work w/ the ASF PRC (public relations committee) on this a bit, I think. Additionally, those of us who blog should do so. I'd also think it would be great if anyone with Wikipedia savviness could put us on the map there. Currently, Wikipedia Mahout is: http://en.wikipedia.org/wiki/Mahout but I think we could make it a "disambiguation" page, or at least add in an Apache Mahout page. Just food for thought... Our community is actually pretty big for a new project, or at least the number of lurkers is pretty big. I think a number of people are in "wait and see" mode, so we (i.e. committers and active contributors) need to get over the hump a bit so that others will feel more comfortable joining in. An official release should help with that, but do let us know if you have other ideas as well.

Time wise, I'd love it if we could have the release out within the month, but of course, I know we are all busy. That being said, we've got a lot of goodness in our repo now, what w/ Taste, Clustering, the GA stuff and the Naive Bayes stuff (kudos to our two active GSOC students Deneche and Robin!)

Cheers,
Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Reply via email to