I didn't get a strong sense from the Hadoop community that 0.21 is all that 
well baked.  To quote the website:
"This release contains many improvements, new features, bug fixes and 
optimizations. It has not undergone testing at scale and should not be 
considered stable or suitable for production. This release is being classified 
as a minor release, which means that it should be API compatible with 0.20.2."

If they can't give it a vote of confidence, then I don't think we should either.

It also reminds me that I think we should at a minimum have a conversation 
about ways we might insulate ourselves a little bit from Hadoop while still 
harnessing all of it's power.  Ted and I talked about it a bit at the Bay Area 
meetup we had a few months ago.  The Plume/Flume stuff seems promising for 
helping with that as well as giving some other benefits, but that relies on us 
having an open source version of Flume (which Ted and others have started).  I 
don't know that it is all that practical in short term and I'm not proposing 
any rewrites at this point, but we should consider it as working at that layer 
might allow the ability to plugin different backends that are better performing 
given certain setups (local, small cluster, large cluster).  Such a bit of 
insulation might allow us to plug in other capabilities as well.  One of the 
things Hadoop has spawned is a whole lot more interest in these kind of 
capabilities and I fully expect to see new/related paradigms coming out.  
Obviously, we aren't just going to jump on anything, but if we can think about 
ways we might be able to plug them in.  Thoughts?

-Grant

On Nov 4, 2010, at 3:35 PM, Jeff Eastman wrote:

> We have historically tracked the latest versions of Hadoop pretty soon after 
> they have been available. If the tests run on 0.21 and it has the 
> CompositeInputFormat then I'd be +1 to move forward. Hopefully there will be 
> a Cloudera version that tracks it pretty soon too, else users will have to 
> build their own AMIs again.
> 
> -----Original Message-----
> From: Shannon Quinn (JIRA) [mailto:[email protected]] 
> Sent: Thursday, November 04, 2010 12:27 PM
> To: [email protected]
> Subject: [jira] Commented: (MAHOUT-537) Bring DistributedRowMatrix into 
> compliance with Hadoop 0.20.2
> 
> 
>    [ 
> https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928314#action_12928314
>  ] 
> 
> Shannon Quinn commented on MAHOUT-537:
> --------------------------------------
> 
> Something worth discussing: Hadoop just released version 0.21.0, which 
> re-includes the updated CompositeInputFormat that was missing in 0.20.2 and 
> deprecated in 0.18. I'm going to install v0.21 and see if tests pass on the 
> trunk, but provided they do then I'm wondering if I should go ahead and 
> implement this patch using Hadoop 0.21. Any thoughts?
> 
>> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
>> -------------------------------------------------------------
>> 
>>                Key: MAHOUT-537
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-537
>>            Project: Mahout
>>         Issue Type: Improvement
>>   Affects Versions: 0.4
>>           Reporter: Shannon Quinn
>>           Assignee: Shannon Quinn
>>        Attachments: MAHOUT-537.patch
>> 
>> 
>> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, 
>> in particular eliminate dependence on the deprecated JobConf, using instead 
>> the separate Job and Configuration objects.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to