RE: Platform reliability with Hadoop

Jeff Eastman Wed, 16 Jan 2008 10:09:25 -0800

Thanks, I will try a safer place for the DFS.
Jeff

-----Original Message-----
From: Jason Venner [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 16, 2008 10:04 AM
To: hadoop-user@lucene.apache.org
Subject: Re: Platform reliability with Hadoop


The /tmp default has caught us once or twice too. Now we put the files 
elsewhere.

[EMAIL PROTECTED] wrote:
>> The DFS is stored in /tmp on each box. 
>> The developers who own the machines occasionally reboot and reprofile
them
>>     
>
> Wont you lose your blocks after reboot since /tmp gets cleaned up?
Could this be the reason you see data corruption?
> Good idea is to configure DFS to be any place other than /tmp 
>
> Thanks,
> Lohit
> ----- Original Message ----
> From: Jeff Eastman <[EMAIL PROTECTED]>
> To: hadoop-user@lucene.apache.org
> Sent: Wednesday, January 16, 2008 9:32:41 AM
> Subject: Platform reliability with Hadoop
>
>
> I've been running Hadoop 0.14.4 and, more recently, 0.15.2 on a dozen
> machines in our CUBiT array for the last month. During this time I
have
> experienced two major data corruption losses on relatively small
>  amounts
> of data (<50gb) that make me wonder about the suitability of this
> platform for hosting Hadoop. CUBiT is one of our products for managing
>  a
> pool of development servers, allowing developers to check out
machines,
> install various OS profiles on them and monitor their utilization via
> the web. With most machines reporting very low utilization it seemed a
> natural place to run Hadoop in the background. I have an NFS-mounted
> account on all of the machines and have installed Hadoop there. The
DFS
> is stored in /tmp on each box. The developers who own the machines
> occasionally reboot and reprofile them, but this occurs infrequently
>  and
> does not clobber /tmp. Hadoop is designed to deal with slave failures
>  of
> this nature, though this platform may well be an acid test.
>
>  
>
> My initial cloud was configured for replication factor of 3 and I have
> increased that now to 4 in hopes of improving data reliability in the
> face of these more-prevalent slave outages. Ted Dunning has suggested
> aggressive rebalancing in his recent posts and I have done this by
> increasing replication to 5 (from 3) and then dropping it to 4. Are
> there other rebalancing or configuration techniques that might improve
> my data reliability? Or, is this platform just too unstable to be a
>  good
> fit for Hadoop?
>
>  
>
> Jeff
>
>
>
>
>

RE: Platform reliability with Hadoop

Reply via email to