Re: [Nutch-dev] Hadoop 0.11.2 vs. 0.12.1

Dennis Kubes Sun, 11 Mar 2007 21:20:35 -0800

> It looks like we might want to at least give it a try then, with the worst
> possible case of Nutch users having to keep speculative execution disabled
> if it causes grief again. If other problems arise, then we can just revert
> back to 0.11.2 which seems to be stable in terms of all the Nutch
> operations.
>
>
> ----- Original Message ----
> From: Andrzej Bialecki <[EMAIL PROTECTED]>
> To: nutch-dev@lucene.apache.org
> Sent: Sunday, March 11, 2007 4:34:38 PM
> Subject: Hadoop 0.11.2 vs. 0.12.1
>
>
> Hi all,
>
> After our discussion about which Hadoop release to use for the upcoming
> Nutch release, I decided to ask around on the Hadoop mailing list. The
> message was clear that we should go with 0.12.1 - see below:
>
> Owen O'Malley wrote:
>>
>> On Mar 10, 2007, at 12:32 AM, Andrzej Bialecki wrote:
>>
>>>> I think the experience on big clusters at Yahoo! is that 0.12.1
>>>> should be more stable than 0.11.2, but others can confirm that.
>>>
>>> Hm.. That's not the impression I have from JIRA and the mailing list.
>>> My impression is that even though 0.12.1 is more robust in some
>>> situations, the significant changes (checksum filesystem, speculative
>>> execution, in memory sorting, improved map output handling, etc, etc)
>>> made between these releases introduced many subtle bugs which only
>>> now start coming into light.
>>
>> We never upgraded our main clusters to 11.2 because it never
>> stabilized to our satisfaction, which is why I was proposing an 11.3.
>> However, 12.1 is looking pretty good with the exception  of a couple
>> of bugs and we decided to hold out for 12.1. At this point, if I was
>> going to 11, I'd want a lot of the fixes that have been done in between.
>
> 0.12.x release has speculative execution turned on by default, but I
> remember that there were places in Nutch that would break when using
> PhasedFileSystem (which is what Hadoop uses when run in that mode). I'm
> afraid there might be other issues here as well - noone tested Nutch
> with 0.12 to be sure that it works ok.
>
> On the other hand, I only tested 0.11.2 in a limited production env., so
> there may be other bugs lurking there that Owen referred to, which show
> up when you run larger jobs (or different jobs).
>
> What do you think?


I agree there may be subtle bugs.

I can do say a full dmoz crawl (~5M pages) with nutch trunk and hadoop
12.1 on a small cluster of 5 machines if this would help?  We have already
done some crawls > 100K urls with 11.2 without problems.  I say let's test
it and if there aren't any significant issues then let's go with 12.1 if
the hadoop team thinks it will be more stable.

One question though, are there any concerns about upgrading clusters as
opposed to new fetches?

Dennis Kubes

>
> --
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Hadoop 0.11.2 vs. 0.12.1

Reply via email to