hi hadoopers, how's your weekend going? i do run out of idea at this point abt behavior of capacity scheduler; been stuck with this for a day and night.
I referred to this doc: http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0/capacity_scheduler.html#Configuring+properties+for+queues Details of environment set up are at the bottom of this email. 1. created 3 queues: orange 40%, apple 40%, default 20% (mapred.capacity-scheduler.queue.orange.capacity) 2. only orange queue: mapred.capacity-scheduler.queue.orange.maximum-capacity = 50 at http://job:50030/, cluster summary shows - map task capacity = 32 - red task capacity = 12 at http://job:50030/scheduler it shows: - 3 queues: orange, apple, default - orange and apple queues: each queue get 12 map tasks capacity, 4 reduce task capacity - default queue: get 6map, 2red Everything seems right up to this point. Fun part begin. 1. i submitted 4 of wordcount to orange queue. (each wordcount use 4 map tasks, so 4 of w.c. jobs will use totally 16 map tasks) 2. default queue and apple queue are both no jobs running 3. 3 of 4 wordcount jobs were started map task right away while map task of the 4th wordcount job wasn't started. 4. from webUI, scheduling information of orange queue. It said "Used capacity: 12 (100.0% of Capacity)" while next line said "Maximum capacity: 16 slots" So what's going on with other 4 slots ? why they are not get used. Is capacity-scheduler supposed to start using extra slots until it hit the Max capacity ? (from the variable of mapred.capacity-scheduler.queue.<queue-name>.maximum-capacity) (there are no other jobs at all in the cluster) I am really thankful for reading up to this point. Truly hope someone can shed some light on this. P ######### ## Settings, ######### - 4 machines of CentOS5.6, 8GB, 250GHD (1 nn + jt, 1 snn, 2 dn) - CDH3u0 - java version "1.6.0_25" ######### ## Full WebUI of all queues ######### ORANGE Queue configuration Capacity Percentage: 40.0% User Limit: 100% Priority Supported: YES ------------- Map tasks Capacity: 12 slots Maximum capacity: 16 slots Used capacity: 12 (100.0% of Capacity) Running tasks: 12 Active users: User 'apps': 12 (100.0% of used capacity) ------------- Reduce tasks Capacity: 4 slots Maximum capacity: 6 slots Used capacity: 0 (0.0% of Capacity) Running tasks: 0 ------------- Job info Number of Waiting Jobs: 0 Number of Initializing Jobs: 0 Number of users who have submitted jobs: 1 APPLE Queue configuration Capacity Percentage: 20.0% User Limit: 100% Priority Supported: NO ------------- Map tasks Capacity: 6 slots Used capacity: 0 (0.0% of Capacity) Running tasks: 0 ------------- Reduce tasks Capacity: 2 slots Used capacity: 0 (0.0% of Capacity) Running tasks: 0 ------------- Job info Number of Waiting Jobs: 0 Number of Initializing Jobs: 0 Number of users who have submitted jobs: 0 DEFAULT Queue configuration Capacity Percentage: 40.0% User Limit: 100% Priority Supported: YES ------------- Map tasks Capacity: 12 slots Used capacity: 0 (0.0% of Capacity) Running tasks: 0 ------------- Reduce tasks Capacity: 4 slots Used capacity: 0 (0.0% of Capacity) Running tasks: 0 ------------- Job info Number of Waiting Jobs: 0 Number of Initializing Jobs: 0 Number of users who have submitted jobs: 0 ######### ## capacity-scheduler.xml ######### <configuration> <!-- Global config --> <property> <name>mapred.capacity-scheduler.init-poll-interval</name> <value>5000</value> </property> <property> <name>mapred.capacity-scheduler.init-worker-threads</name> <value>20</value> </property> <!-- Queeu: default --> <property> <name>mapred.capacity-scheduler.queue.default.capacity</name> <value>20</value> </property> <!-- Queue: orange --> <property> <name>mapred.capacity-scheduler.queue.orange.capacity</name> <value>40</value> </property> <property> <name>mapred.capacity-scheduler.queue.orange.maximum-capacity</name> <value>50</value> </property> <property> <name>mapred.capacity-scheduler.queue.orange.maximum-initialized-jobs-per-user</name> <value>5000</value> </property> <property> <name>mapred.capacity-scheduler.queue.orange.supports-priority</name> <value>true</value> </property> <!-- Queue: apple --> <property> <name>mapred.capacity-scheduler.queue.apple.capacity</name> <value>40</value> </property> <property> <name>mapred.capacity-scheduler.queue.orange.maximum-initialized-jobs-per-user</name> <value>5000</value> </property> <!-- <property> <name>mapred.capacity-scheduler.queue.apple.maximum-capacity</name> <value>-1</value> </property> --> <property> <name>mapred.capacity-scheduler.queue.apple.supports-priority</name> <value>true</value> </property>
