I'm new to setting up hadoop's scheduler and i'm trying to set up Fairscheduler on a 3-node cluster. The initial setup is fine but throughput is abysmal.
Each node is configured with 16 map task capacity and 8 reduce task capacity. Most jobs being run are reading data from cassandra installed on the same nodes using ColumnFamilyInputFormat. With the default scheduler these jobs take from 5 to 15 minutes. When i plug in the fairscheduler they take from one to many hours. What i see is that the map task capacity is not being used. Jobs now only run 3 map tasks at a time whereas before they would always run all 48 map tasks. This is without any custom fair-scheduler.xml configuration. But i've also tried configuring userMaxJobsDefault, maxRunningJobs, and weight without any luck. I've also tried adding mapred.fairscheduler.locality.delay=0 without any luck. Is it possible with fairscheduler to get the same throughput when only one job is running as it is with hadoop's default scheduler? Am i missing something obvious? ~mck -- Linux, because I'd rather own a free OS than steal one that's not worth paying for. | http://semb.wever.org | http://sesat.no | | http://tech.finn.no | Java XSS Filter |
signature.asc
Description: This is a digitally signed message part
