A lot of it depends on your staff and their experiences.
Maybe they don't have hadoop, but if they were involved with large
databases, data warehouse, etc they can utilize their skills & experiences
and provide a lot of help.
If you have linux admins, system admins, network admins with years of
experience, they will be a goldmine.    At the other end, database
developers who know SQL, programmers who know Java, and so on can really
help staff up your 'big data' team. Having a few people who know ETL would
be great too.

 The biggest problem I've run into seems to be how big the Hadoop
project/team is or is not. Sometimes it's just an 'experimental'
department and therefore half the people are only 25-50 percent available
to help out.  And if they aren't really that knowledgeable about hadoop,
it tends to be one of those, not enough time in the day scenarios.  And
the few people dedicated to the Hadoop project(s) will get the brunt of
the work.

  It's like any ecosystem.  To do it right, you might need system/network
admins, a storage person to actually know how to set up the proper storage
architecture, maybe a security expert,  a few programmers, and a few data
people.   If you're combining analytics, that's another group.  Of course
most companies outside the Google and Facebooks of the world,  will have a
few people dedicated to Hadoop.  Which means you need somebody who knows
storage, knows networking, knows linux, knows how to be a system admin,
knows security, and maybe other things(AKA if you have a firewall issue,
somebody needs to figure out ways to make it work through or around),  and
then you need some programmes who either know MapReduce or can pretty much
figure it out because they've done java for years.

Peter J

On 2/23/12 10:17 AM, "Pavel Frolov" <pfro...@gmail.com> wrote:

>Hi,
>
>We are going into 24x7 production soon and we are considering whether we
>need vendor support or not.  We use a free vendor distribution of Cluster
>Provisioning + Hadoop + HBase and looked at their Enterprise version but
>it
>is very expensive for the value it provides (additional functionality +
>support), given that we¹ve already ironed out many of our performance and
>tuning issues on our own and with generous help from the community (e.g.
>all of you).
>
>So, I wanted to run it through the community to see if anybody can share
>their experience of running a Hadoop cluster (50+ nodes with Apache
>releases or Vendor distributions) in production, with in-house support
>only, and how difficult it was.  How many people were involved, etc..
>
>Regards,
>Pavel

Reply via email to