Useful contributions. I want to find out one more thing, has Hadoop been successfully simulated so far? May using Opnet or ns2?
Regards, kobina. On 18 September 2011 03:37, Michael Segel <michael_se...@hotmail.com> wrote: > > Gee Tom, > No disrespect, but I don't believe you have any personal practical > experience in designing and building out clusters or putting them to the > test. > > Now to the points that Brian raised.. > > 1) SPOF... it sounds great on paper. Some FUD to scare someone away from > Hadoop. But in reality... you can mitigate your risks by setting up raid on > your NN/HM node. You can also NFS mount a copy to your SN (or whatever > they're calling it these days...) Or you can go to MapR which has redesigned > HDFS which removes this problem. But with your Apache Hadoop or Cloudera's > release, losing your NN is rare. Yes it can happen, but not your greatest > risk. (Not by a long shot) > > 2) Data Loss. > You can mitigate this as well. Do I need to go through all of the options > and DR/BCP planning? Sure there's always a chance that you have some Luser > who does something brain dead. This is true of all databases and systems. (I > know I can probably recount some of IBM's Informix and DB2 having data loss > issues. But that's a topic for another time. ;-) > > I can't speak for Brian, but I don't think he's trivializing it. In fact I > think he's doing a fine job of level setting expectations. > > And if you talk to Ted Dunning of MapR, I'm sure he'll point out that their > current release does address points 3 and 4 again making their risks moot. > (At least if you're using MapR) > > -Mike > > > > Subject: Re: risks of using Hadoop > > From: tdeut...@us.ibm.com > > Date: Sat, 17 Sep 2011 17:38:27 -0600 > > To: common-user@hadoop.apache.org > > > > I disagree Brian - data loss and system down time (both potentially > non-trival) should not be taken lightly. Use cases and thus availability > requirements do vary, but I would not encourage anyone to shrug them off as > "overblown", especially as Hadoop become more production oriented in > utilization. > > > > --------------------------------------- > > Sent from my Blackberry so please excuse typing and spelling errors. > > > > > > ----- Original Message ----- > > From: Brian Bockelman [bbock...@cse.unl.edu] > > Sent: 09/17/2011 05:11 PM EST > > To: common-user@hadoop.apache.org > > Subject: Re: risks of using Hadoop > > > > > > > > > > On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote: > > > > > Hi Kobina, > > > > > > Some experiences which may helpful for you with respective to DFS. > > > > > > 1. Selecting the correct version. > > > I will recommend to use 0.20X version. This is pretty stable version > and all other organizations prefers it. Well tested as well. > > > Dont go for 21 version.This version is not a stable version.This is > risk. > > > > > > 2. You should perform thorough test with your customer operations. > > > (of-course you will do this :-)) > > > > > > 3. 0.20x version has the problem of SPOF. > > > If NameNode goes down you will loose the data.One way of recovering > is by using the secondaryNameNode.You can recover the data till last > checkpoint.But here manual intervention is required. > > > In latest trunk SPOF will be addressed bu HDFS-1623. > > > > > > 4. 0.20x NameNodes can not scale. Federation changes included in latest > versions. ( i think in 22). this may not be the problem for your cluster. > But please consider this aspect as well. > > > > > > > With respect to (3) and (4) - these are often completely overblown for > many Hadoop use cases. If you use Hadoop as originally designed (large > scale batch data processing), these likely don't matter. > > > > If you're looking at some of the newer use cases (low latency stuff or > time-critical processing), or if you architect your solution poorly (lots of > small files), these issues become relevant. Another case where I see folks > get frustrated is using Hadoop as a "plain old batch system"; for non-data > workflows, it doesn't measure up against specialized systems. > > > > You really want to make sure that Hadoop is the best tool for your job. > > > > Brian > >