I'm new to setting up hadoop's scheduler and i'm trying to set up
Fairscheduler on a 3-node cluster. The initial setup is fine but
throughput is abysmal.

Each node is configured with 16 map task capacity and 8 reduce task
capacity. Most jobs being run are reading data from cassandra installed
on the same nodes using ColumnFamilyInputFormat.

With the default scheduler these jobs take from 5 to 15 minutes.

When i plug in the fairscheduler they take from one to many hours.

What i see is that the map task capacity is not being used. Jobs now
only run 3 map tasks at a time whereas before they would always run all
48 map tasks.

This is without any custom fair-scheduler.xml configuration. But i've also
tried configuring userMaxJobsDefault, maxRunningJobs, and weight
without any luck.

I've also tried adding mapred.fairscheduler.locality.delay=0 without any
luck.

Is it possible with fairscheduler to get the same throughput when only
one job is running as it is with hadoop's default scheduler? Am i
missing something obvious?

~mck


-- 
Linux, because I'd rather own a free OS than steal one that's not worth
paying for. 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to