Hi,

I see "Shuffle and Sort Configuration Tuning" in "Hadoop---The Definitive 
Guide", which told me that each job in the same cluster can use different 
parameters below without restart the cluster. But some of my partner told me 
not. For some reason I have no Linux cluster at hand. I wonder whether it is 
possible to use different parameters below in different jobs without restart 
the cluster. There is a simple example below to explain what I mean.

If true, which parameters can be used differently? Can all the parameters 
below? Is there any more?  Thank you!

eg. I started a Hadoop Cluster normally and submit Job A with "io.sort.mb" 
equaling 100, "io.sort.record.percent" equaling 0.05, etc. Before Job A 
finished, I want to submit Job B in the same cluster with  "io.sort.mb" 
equaling 120, "io.sort.record.percent" equaling 0.08, etc.

parameters:
io.sort.mb
io.sort.record.percent
io.sort.spill.percent
io.sort.factor
min.num.spills.for.combine
mapred.compress.map.output
mapred.map.output.compression.codec
mapred.reduce.parallel.copies
mapred.reduce.copy.backoff
io.sort.factor
mapred.job.shuffle.input.buffer.percent
mapred.job.shuffle.merge.percent
mapred.inmem.merge.threshold
mapred.job.reduce.input.buffer.percent

Best regards,

Evan

__________________________________________________
�Ͽ�ע���Ż�����������������?
http://cn.mail.yahoo.com

Reply via email to