You're running out of memory trying to generate the splits. You need to set a bigger heap for your driver program. Assuming you're using the hadoop jar command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a larger value in $HADOOP_HOME/conf/hadoop-env.sh
-Joey On Jul 24, 2011 5:07 AM, "Gagan Bansal" <gagan.ban...@gmail.com> wrote: > Hi All, > > I am getting the following error on running a job on about 12 TB of data. > This happens before any mappers or reducers are launched. > Also the job starts fine if I reduce the amount of input data. Any ideas as > to what may be the reason for this error? > > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOf(Arrays.java:2786) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) > at java.io.DataOutputStream.writeByte(DataOutputStream.java:136) > at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278) > at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250) > at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131) > at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111) > at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741) > at org.apache.hadoop.ipc.Client.call(Client.java:1011) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > at $Proxy6.getBlockLocations(Unknown Source) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy6.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359) > at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946) > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938) > at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781) > at > org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876) > > Gagan Bansal