Thanks, I will try a safer place for the DFS. Jeff -----Original Message----- From: Jason Venner [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 16, 2008 10:04 AM To: hadoop-user@lucene.apache.org Subject: Re: Platform reliability with Hadoop
The /tmp default has caught us once or twice too. Now we put the files elsewhere. [EMAIL PROTECTED] wrote: >> The DFS is stored in /tmp on each box. >> The developers who own the machines occasionally reboot and reprofile them >> > > Wont you lose your blocks after reboot since /tmp gets cleaned up? Could this be the reason you see data corruption? > Good idea is to configure DFS to be any place other than /tmp > > Thanks, > Lohit > ----- Original Message ---- > From: Jeff Eastman <[EMAIL PROTECTED]> > To: hadoop-user@lucene.apache.org > Sent: Wednesday, January 16, 2008 9:32:41 AM > Subject: Platform reliability with Hadoop > > > I've been running Hadoop 0.14.4 and, more recently, 0.15.2 on a dozen > machines in our CUBiT array for the last month. During this time I have > experienced two major data corruption losses on relatively small > amounts > of data (<50gb) that make me wonder about the suitability of this > platform for hosting Hadoop. CUBiT is one of our products for managing > a > pool of development servers, allowing developers to check out machines, > install various OS profiles on them and monitor their utilization via > the web. With most machines reporting very low utilization it seemed a > natural place to run Hadoop in the background. I have an NFS-mounted > account on all of the machines and have installed Hadoop there. The DFS > is stored in /tmp on each box. The developers who own the machines > occasionally reboot and reprofile them, but this occurs infrequently > and > does not clobber /tmp. Hadoop is designed to deal with slave failures > of > this nature, though this platform may well be an acid test. > > > > My initial cloud was configured for replication factor of 3 and I have > increased that now to 4 in hopes of improving data reliability in the > face of these more-prevalent slave outages. Ted Dunning has suggested > aggressive rebalancing in his recent posts and I have done this by > increasing replication to 5 (from 3) and then dropping it to 4. Are > there other rebalancing or configuration techniques that might improve > my data reliability? Or, is this platform just too unstable to be a > good > fit for Hadoop? > > > > Jeff > > > > >