I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB ethernet connection) After triggering any MR job, its taking like 3-5 seconds to launch ( I mean the time when I can see any MR job completion % on the screen). I know internally its trying to launch the job,intialize mappers, loading data etc. What I want to know - Is it a default/desired/expected hadoop behavior or there are ways in which I can decrease this startup time ?
Also I feel like my hadoop jobs should run faster, but I am still not able to make it as fast as it should be according to me ? I did some tunning also, following are the parameters I am playing around these days but still I feel there are something missing that I can still use: dfs.block.size: mapred.compress.map.output mapred.map/reduce.tasks.speculative.execution mapred.tasktracker.map/reduce.tasks.maximum: mapred.child.java.opts io.sort.mb: io.sort.factor: mapred.reduce.parallel.copies: mapred.job.reuse.jvm.num.tasks: Thanks, Praveenesh
