I didn't get a strong sense from the Hadoop community that 0.21 is all that well baked. To quote the website: "This release contains many improvements, new features, bug fixes and optimizations. It has not undergone testing at scale and should not be considered stable or suitable for production. This release is being classified as a minor release, which means that it should be API compatible with 0.20.2."
If they can't give it a vote of confidence, then I don't think we should either. It also reminds me that I think we should at a minimum have a conversation about ways we might insulate ourselves a little bit from Hadoop while still harnessing all of it's power. Ted and I talked about it a bit at the Bay Area meetup we had a few months ago. The Plume/Flume stuff seems promising for helping with that as well as giving some other benefits, but that relies on us having an open source version of Flume (which Ted and others have started). I don't know that it is all that practical in short term and I'm not proposing any rewrites at this point, but we should consider it as working at that layer might allow the ability to plugin different backends that are better performing given certain setups (local, small cluster, large cluster). Such a bit of insulation might allow us to plug in other capabilities as well. One of the things Hadoop has spawned is a whole lot more interest in these kind of capabilities and I fully expect to see new/related paradigms coming out. Obviously, we aren't just going to jump on anything, but if we can think about ways we might be able to plug them in. Thoughts? -Grant On Nov 4, 2010, at 3:35 PM, Jeff Eastman wrote: > We have historically tracked the latest versions of Hadoop pretty soon after > they have been available. If the tests run on 0.21 and it has the > CompositeInputFormat then I'd be +1 to move forward. Hopefully there will be > a Cloudera version that tracks it pretty soon too, else users will have to > build their own AMIs again. > > -----Original Message----- > From: Shannon Quinn (JIRA) [mailto:[email protected]] > Sent: Thursday, November 04, 2010 12:27 PM > To: [email protected] > Subject: [jira] Commented: (MAHOUT-537) Bring DistributedRowMatrix into > compliance with Hadoop 0.20.2 > > > [ > https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928314#action_12928314 > ] > > Shannon Quinn commented on MAHOUT-537: > -------------------------------------- > > Something worth discussing: Hadoop just released version 0.21.0, which > re-includes the updated CompositeInputFormat that was missing in 0.20.2 and > deprecated in 0.18. I'm going to install v0.21 and see if tests pass on the > trunk, but provided they do then I'm wondering if I should go ahead and > implement this patch using Hadoop 0.21. Any thoughts? > >> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2 >> ------------------------------------------------------------- >> >> Key: MAHOUT-537 >> URL: https://issues.apache.org/jira/browse/MAHOUT-537 >> Project: Mahout >> Issue Type: Improvement >> Affects Versions: 0.4 >> Reporter: Shannon Quinn >> Assignee: Shannon Quinn >> Attachments: MAHOUT-537.patch >> >> >> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, >> in particular eliminate dependence on the deprecated JobConf, using instead >> the separate Job and Configuration objects. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
