Hi, Our current production environment is hadoop 1.1 using MRV1, which stores different kind of data sets in AVRO format. In default case, we set map.java.opts=-Xmx1024M and reduce.java.opts=-Xmx2048M, and for some data sets, the end user can change map.java.opts=-Xmx2048M and reduce.java.opts=-Xmx3072M in the pig/hive script, as some data sets could be complex/big enough requiring more runtime heap space in either map or reduce task. This works fine in our current cluster, but I have following questions related to yarn, if our production will upgrade to hadoop 2 with YARN in the future. My understanding is that there are several parameters to set the node capacity in case of memory in Yarn for MapReduce. If each node in our cluster has 64G physical memory, here is what I think so far: 1) yarn.nodemanager.resource.memory-mb=56G (leave 8G to OS, plus data node task and node manager)2) I will leave yarn-nodemanager.vmem-pmem-ratio=2.13) mapreduce.map.memory.mb=1536m mapreduce.reduce.memory.mb=2560m mapreduce.map.java.opts=-Xmx1024m mapreduce.reduce.java.opts=-Xmx2048m yarn.scheduler.minimum-allocation-mb=512m yarn.scheduler.maxinum-allocation-mb=4096m >From my understanding, the above settings will force each container to have >minimum 512M memory, but up to 4096M. By default the mapper container will ask >1536M memory allocation, and reducer container will ask 2560M memory >allocation, correct? If so, it is close to my current production default >setting. Now, my question is how the client in either pig/hive script to ask more memory in YARN at run time? For example, for the data set A, originally in MRV1, I know the mapper task needs 2G heap and reducer task needs 3G heap, in this case, in the pig or hive session, what parameters should I set? For mapper task, should I do:set mapreduce.map.memory.mb=2048mor set mapreduce.map.java.opts=-Xmx2048mor I have to set bothset mapreduce.map.memory.mb=2560mset mapreduce.map.java.opts=-Xmx2048m Another question is that if this parameter "yarn.scheduler.maxinum-allocation-mb=4096m" is a hard setting? What I mean is that if it is set, can any mapper or reducer container ask more memory than 4G? How? Thanks Yong