Hi.

The crawlers are _very_ threaded but no we use our own threading framework
since it was not available at the time on hadoop-core.

Crawlers normally just wait a lot on clients inducing very little CPU but
consumes some memory due to the parallellism.

//Marcus

On Sat, Jun 27, 2009 at 6:10 PM, jason hadoop <jason.had...@gmail.com>wrote:

> How about multi-threaded mappers?
> Multi-Threaded mappers are ideal for map tasks that are non locally io
> bound
> with many distinct endpoints.
> You can also control the thread count on a per job basis.
>
> On Sat, Jun 27, 2009 at 8:26 AM, Marcus Herou <marcus.he...@tailsweep.com
> >wrote:
>
> > The argument currently against increasing num-mappers is that the
> machines
> > will get into oom and since a lot of the jobs are crawlers I need more
> > ip-numbers so I don't get banned :)
> >
> > Thing is that we currently have solr on the very same machines and
> > data-nodes as well so I can only give the MR nodes about 1G memory since
> I
> > need SOLR to have 4G...
> >
> > Now I see that I should get some obvious and juste critique about the
> > layout
> > of this arch but I'm a little limited in budget and so is then the arch
> :)
> >
> > However is it wise to have the MR tasks on the same nodes as the
> data-nodes
> > or should I split the arch ? I mean the data-nodes perhaps need more
> > disk-IO
> > and the MR more memory and CPU ?
> >
> > Trying to find a sweetspot hardware spec of those two roles.
> >
> > //Marcus
> >
> >
> >
> > On Sat, Jun 27, 2009 at 4:24 AM, Brian Bockelman <bbock...@cse.unl.edu
> > >wrote:
> >
> > > Hey Marcus,
> > >
> > > Are you recording the data rates coming out of HDFS?  Since you have
> such
> > a
> > > low CPU utilizations, I'd look at boxes utterly packed with big hard
> > drives
> > > (also, why are you using RAID1 for Hadoop??).
> > >
> > > You can get 1U boxes with 4 drive bays or 2U boxes with 12 drive bays.
> > >  Based on the data rates you see, make the call.
> > >
> > > On the other hand, what's the argument against running 3x more mappers
> > per
> > > box?  It seems that your boxes still have more overhead to use --
> there's
> > no
> > > I/O wait.
> > >
> > > Brian
> > >
> > >
> > > On Jun 26, 2009, at 4:43 PM, Marcus Herou wrote:
> > >
> > >  Hi.
> > >>
> > >> We have a deployment of 10 hadoop servers and I now need more mapping
> > >> capability (no not just add more mappers per instance) since I have so
> > >> many
> > >> jobs running. Now I am wondering what I should aim on...
> > >> Memory, cpu or disk... How long is a rope perhaps you would say ?
> > >>
> > >> A typical server is currently using about 15-20% cpu today on a
> > quad-core
> > >> 2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks.
> > >>
> > >> Some specs below.
> > >>
> > >>> mpstat 2 5
> > >>>
> > >> Linux 2.6.24-19-server (mapreduce2)     06/26/2009
> > >>
> > >> 11:36:13 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft
>  %steal
> > >> %idle    intr/s
> > >> 11:36:15 PM  all   22.82    0.00    3.24    1.37    0.62    2.49
>  0.00
> > >> 69.45   8572.50
> > >> 11:36:17 PM  all   13.56    0.00    1.74    1.99    0.62    2.61
>  0.00
> > >> 79.48   8075.50
> > >> 11:36:19 PM  all   14.32    0.00    2.24    1.12    1.12    2.24
>  0.00
> > >> 78.95   9219.00
> > >> 11:36:21 PM  all   14.71    0.00    0.87    1.62    0.25    1.75
>  0.00
> > >> 80.80   8489.50
> > >> 11:36:23 PM  all   12.69    0.00    0.87    1.24    0.50    0.75
>  0.00
> > >> 83.96   5495.00
> > >> Average:     all   15.62    0.00    1.79    1.47    0.62    1.97
>  0.00
> > >> 78.53   7970.30
> > >>
> > >> What I am thinking is... Is it wiser to go for many of these cheap
> boxes
> > >> with 8GB of RAM or should I for instance focus on machines which can
> > give
> > >> more I|O throughput ?
> > >>
> > >> I know that these things are hard but perhaps someone have draw some
> > >> conclusions before the pragmatic way.
> > >>
> > >> Kindly
> > >>
> > >> //Marcus
> > >>
> > >>
> > >> --
> > >> Marcus Herou CTO and co-founder Tailsweep AB
> > >> +46702561312
> > >> marcus.he...@tailsweep.com
> > >> http://www.tailsweep.com/
> > >>
> > >
> > >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Reply via email to