Re: risks of using Hadoop

Kobina Kwarko Sun, 18 Sep 2011 08:21:09 -0700

Useful contributions. I want to find out one more thing, has Hadoop been
successfully simulated so far? May using Opnet or ns2?


Regards,

kobina.


On 18 September 2011 03:37, Michael Segel <michael_se...@hotmail.com> wrote:

>
> Gee Tom,
> No disrespect, but I don't believe you have any personal practical
> experience in designing and building out clusters or putting them to the
> test.
>
> Now to the points that Brian raised..
>
> 1) SPOF... it sounds great on paper. Some FUD to scare someone away from
> Hadoop. But in reality... you can mitigate your risks by setting up raid on
> your NN/HM node. You can also NFS mount a copy to your SN (or whatever
> they're calling it these days...) Or you can go to MapR which has redesigned
> HDFS which removes this problem. But with your Apache Hadoop or Cloudera's
> release, losing your NN is rare. Yes it can happen, but not your greatest
> risk. (Not by a long shot)
>
> 2) Data Loss.
> You can mitigate this as well. Do I need to go through all of the options
> and DR/BCP planning? Sure there's always a chance that you have some Luser
> who does something brain dead. This is true of all databases and systems. (I
> know I can probably recount some of IBM's Informix and DB2 having data loss
> issues. But that's a topic for another time. ;-)
>
> I can't speak for Brian, but I don't think he's trivializing it. In fact I
> think he's doing a fine job of level setting expectations.
>
> And if you talk to Ted Dunning of MapR, I'm sure he'll point out that their
> current release does address points 3 and 4 again making their risks moot.
> (At least if you're using MapR)
>
> -Mike
>
>
> > Subject: Re: risks of using Hadoop
> > From: tdeut...@us.ibm.com
> > Date: Sat, 17 Sep 2011 17:38:27 -0600
> > To: common-user@hadoop.apache.org
> >
> > I disagree Brian - data loss and system down time (both potentially
> non-trival) should not be taken lightly. Use cases and thus availability
> requirements do vary, but I would not encourage anyone to shrug them off as
> "overblown", especially as Hadoop become more production oriented in
> utilization.
> >
> > ---------------------------------------
> > Sent from my Blackberry so please excuse typing and spelling errors.
> >
> >
> > ----- Original Message -----
> > From: Brian Bockelman [bbock...@cse.unl.edu]
> > Sent: 09/17/2011 05:11 PM EST
> > To: common-user@hadoop.apache.org
> > Subject: Re: risks of using Hadoop
> >
> >
> >
> >
> > On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote:
> >
> > > Hi Kobina,
> > >
> > > Some experiences which may helpful for you with respective to DFS.
> > >
> > > 1. Selecting the correct version.
> > >    I will recommend to use 0.20X version. This is pretty stable version
> and all other organizations prefers it. Well tested as well.
> > > Dont go for 21 version.This version is not a stable version.This is
> risk.
> > >
> > > 2. You should perform thorough test with your customer operations.
> > >  (of-course you will do this :-))
> > >
> > > 3. 0.20x version has the problem of SPOF.
> > >   If NameNode goes down you will loose the data.One way of recovering
> is by using the secondaryNameNode.You can recover the data till last
> checkpoint.But here manual intervention is required.
> > > In latest trunk SPOF will be addressed bu HDFS-1623.
> > >
> > > 4. 0.20x NameNodes can not scale. Federation changes included in latest
> versions. ( i think in 22). this may not be the problem for your cluster.
> But please consider this aspect as well.
> > >
> >
> > With respect to (3) and (4) - these are often completely overblown for
> many Hadoop use cases.  If you use Hadoop as originally designed (large
> scale batch data processing), these likely don't matter.
> >
> > If you're looking at some of the newer use cases (low latency stuff or
> time-critical processing), or if you architect your solution poorly (lots of
> small files), these issues become relevant.  Another case where I see folks
> get frustrated is using Hadoop as a "plain old batch system"; for non-data
> workflows, it doesn't measure up against specialized systems.
> >
> > You really want to make sure that Hadoop is the best tool for your job.
> >
> > Brian
>
>

Re: risks of using Hadoop

Reply via email to