Hi

I am facing this issue of spill failed when I increase the io.sort.mb to 1500 
or 2000
It runs fine with 500 or 1000 but I get some spilled records ( 780 million 
spilled out of total  5.3 billion map output records). 

I configured 9GB of VM to each mapper and configured 4 mapper on each node 
having 48GB of RAM.
There was no heap space issue. I got the following error :

java.io.IOException: Spill failed
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1028)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:690)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
com.guavus.mapred.bizreflex.job.BaseJob.Mapper.gleaning_cube(Mapper.java:450)
        at 
com.guavus.mapred.bizreflex.job.BaseJob.Mapper.netflow_mapper(Mapper.java:317)
        at com.guavus.mapred.bizreflex.job.BaseJob.Mapper.map(Mapper.java:387)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(Unknown Source)
        at 
com.guavus.mapred.common.collection.ValueCollection.readFields(ValueCollection.java:24)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
        at 
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
        at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:852)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1343)
I also increased the io.sort.factor from default(10) to 500 but still the error 
occurs.

Can someone comment what could be the possible reason fir this issue as it does 
not occur for lower value of io.sort.mb.





Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, 
Gurgaon,Haryana.
Mobile Number +91-9899949788

Reply via email to