Second that.

As far as i know ( and our company is following that too), pretty much
everybody is using CDH3b3 in production, which is technically based on 0.20
tree. If you move to 0.21, that would render mahout incompatible with a lot
of stuff running in production IMO.

Also, it's not just hadoop. it's hive, pig, hbase etc. Pig, for once, has
not been ported to 0.21 and from what i heard there's not even an effort on
horizon to break ground in that direction. That alone would preclude a lot
of folks from moving on to 0.21. A lot of people are locked in to cloudera's
distro and ecosystem stuff that has various degrees of readiness (or none at
all).

I personally prefer to use the new api from CDH3b3 (append api and hbase
enhancements are especially hard to ignore) but i imagine we will not switch
to 0.21 until there's at least a stable pig version for it. My guess this
reasoning is pretty typical around.

Thanks.

-Dmitriy

On Fri, Nov 5, 2010 at 7:09 AM, Grant Ingersoll <[email protected]> wrote:

> I didn't get a strong sense from the Hadoop community that 0.21 is all that
> well baked.  To quote the website:
> "This release contains many improvements, new features, bug fixes and
> optimizations. It has not undergone testing at scale and should not be
> considered stable or suitable for production. This release is being
> classified as a minor release, which means that it should be API compatible
> with 0.20.2."
>
> If they can't give it a vote of confidence, then I don't think we should
> either.
>
> It also reminds me that I think we should at a minimum have a conversation
> about ways we might insulate ourselves a little bit from Hadoop while still
> harnessing all of it's power.  Ted and I talked about it a bit at the Bay
> Area meetup we had a few months ago.  The Plume/Flume stuff seems promising
> for helping with that as well as giving some other benefits, but that relies
> on us having an open source version of Flume (which Ted and others have
> started).  I don't know that it is all that practical in short term and I'm
> not proposing any rewrites at this point, but we should consider it as
> working at that layer might allow the ability to plugin different backends
> that are better performing given certain setups (local, small cluster, large
> cluster).  Such a bit of insulation might allow us to plug in other
> capabilities as well.  One of the things Hadoop has spawned is a whole lot
> more interest in these kind of capabilities and I fully expect to see
> new/related paradigms coming out.  Obviously, we aren't just going to jump
> on anything, but if we can think about ways we might be able to plug them
> in.  Thoughts?
>
> -Grant
>
> On Nov 4, 2010, at 3:35 PM, Jeff Eastman wrote:
>
> > We have historically tracked the latest versions of Hadoop pretty soon
> after they have been available. If the tests run on 0.21 and it has the
> CompositeInputFormat then I'd be +1 to move forward. Hopefully there will be
> a Cloudera version that tracks it pretty soon too, else users will have to
> build their own AMIs again.
> >
> > -----Original Message-----
> > From: Shannon Quinn (JIRA) [mailto:[email protected]]
> > Sent: Thursday, November 04, 2010 12:27 PM
> > To: [email protected]
> > Subject: [jira] Commented: (MAHOUT-537) Bring DistributedRowMatrix into
> compliance with Hadoop 0.20.2
> >
> >
> >    [
> https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928314#action_12928314]
> >
> > Shannon Quinn commented on MAHOUT-537:
> > --------------------------------------
> >
> > Something worth discussing: Hadoop just released version 0.21.0, which
> re-includes the updated CompositeInputFormat that was missing in 0.20.2 and
> deprecated in 0.18. I'm going to install v0.21 and see if tests pass on the
> trunk, but provided they do then I'm wondering if I should go ahead and
> implement this patch using Hadoop 0.21. Any thoughts?
> >
> >> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> >> -------------------------------------------------------------
> >>
> >>                Key: MAHOUT-537
> >>                URL: https://issues.apache.org/jira/browse/MAHOUT-537
> >>            Project: Mahout
> >>         Issue Type: Improvement
> >>   Affects Versions: 0.4
> >>           Reporter: Shannon Quinn
> >>           Assignee: Shannon Quinn
> >>        Attachments: MAHOUT-537.patch
> >>
> >>
> >> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2
> API, in particular eliminate dependence on the deprecated JobConf, using
> instead the separate Job and Configuration objects.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Reply via email to