Hi, We have a Map-Only job which runs files produced by chukwa and it works like a charm. We have the "mapred.job.map.memory.mb" set to 1536m and it works for hundreds of thousands of files that we process. However recently we encountered an issue in which for two particular files, the process goes beyond this number and quits with memory limit exceeded error. We did an analysis of the records in the file and there is no issue. The max length of the records is 4K. I looked at the heap dump and saw that the process does not use more than 200 MB of heap memory. When i observe the process memory usage using top i see the virtual memory of the process reaches almost 2GB for the process. Any pointers as to how i can debug this issue? This does not look like a heap memory issue and looks like more of native memory issue.
How do i exactly find out whats causing the spike in memory? Amit