Hi,
Our current production environment is hadoop 1.1 using MRV1, which stores 
different kind of data sets in AVRO format. 
In default case, we set map.java.opts=-Xmx1024M and reduce.java.opts=-Xmx2048M, 
and for some data sets, the end user can change map.java.opts=-Xmx2048M and 
reduce.java.opts=-Xmx3072M in the pig/hive script, as some data sets could be 
complex/big enough requiring more runtime heap space in either map or reduce 
task.
This works fine in our current cluster, but I have following questions related 
to yarn, if our production will upgrade to hadoop 2 with YARN in the future.
My understanding is that there are several parameters to set the node capacity 
in case of memory in Yarn for MapReduce. If each node in our cluster has 64G 
physical memory, here is what I think so far:
1) yarn.nodemanager.resource.memory-mb=56G (leave 8G to OS, plus data node task 
and node manager)2) I will leave yarn-nodemanager.vmem-pmem-ratio=2.13) 
mapreduce.map.memory.mb=1536m    mapreduce.reduce.memory.mb=2560m    
mapreduce.map.java.opts=-Xmx1024m    mapreduce.reduce.java.opts=-Xmx2048m    
yarn.scheduler.minimum-allocation-mb=512m    
yarn.scheduler.maxinum-allocation-mb=4096m
>From my understanding, the above settings will force each container to have 
>minimum 512M memory, but up to 4096M. By default the mapper container will ask 
>1536M memory allocation, and reducer container will ask 2560M memory 
>allocation, correct? If so, it is close to my current production default 
>setting.
Now, my question is how the client in either pig/hive script to ask more memory 
in YARN at run time?
For example, for the data set A, originally in MRV1, I know the mapper task 
needs 2G heap and reducer task needs 3G heap, in this case, in the pig or hive 
session, what parameters should I set?
For mapper task, should I do:set mapreduce.map.memory.mb=2048mor set 
mapreduce.map.java.opts=-Xmx2048mor I have to set bothset 
mapreduce.map.memory.mb=2560mset mapreduce.map.java.opts=-Xmx2048m
Another question is that if this parameter 
"yarn.scheduler.maxinum-allocation-mb=4096m" is a hard setting? What I mean is 
that if it is set, can any mapper or reducer container ask more memory than 4G? 
How?
Thanks
Yong

                
        
        
                
                        
                                
                                        


                                
                        
                
                
        
        
                
                        
                                
                                        


                                
                        
                                                          

Reply via email to