Hi Arpit,

I'm uncertain as to the exact cause of the exception (maybe an integer
overflow somewhere?) but I'd just like to point out that in general,
increasing io.sort.mb to such a high value is not necessarily a good thing.
Sorting is an expensive operation, and uses non-linear time complexity.
Since you have a combiner, depending on the reduction in record numbers it
achieves it can be significantly faster to sort the data in small segments
and then merge the combiner output, especially since the large amount of RAM
in your machine likely means the spill data will be in the page cache so no
I/O should be needed to read it back.

Also, check if you really need such a large buffer. Since you have a very
large number of records, if the individual records are small it's likely the
map task is spilling not because the data buffer is full, but because the
accounting area is full. If that is the case, you may want to increase
io.sort.record.percent while keeping io.sort.mb the same.

Finally, io.sort.factor is not related to this error; it indicates the
number of on-disk segments to merge in a single merge pass, it doesn't apply
to the sort and spill phase at all.

Cheers,
Sven

-----Original Message-----
From: Arpit Wanchoo [mailto:arpit.wanc...@guavus.com] 
Sent: dinsdag 7 augustus 2012 14:31
To: mapreduce-user@hadoop.apache.org
Subject: Spill Failed when io.sort.mb is increased

Hi

I am facing this issue of spill failed when I increase the io.sort.mb to
1500 or 2000 It runs fine with 500 or 1000 but I get some spilled records (
780 million spilled out of total  5.3 billion map output records). 

I configured 9GB of VM to each mapper and configured 4 mapper on each node
having 48GB of RAM.
There was no heap space issue. I got the following error :

java.io.IOException: Spill failed
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1028)
        at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:690)
        at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCont
ext.java:80)
        at
com.guavus.mapred.bizreflex.job.BaseJob.Mapper.gleaning_cube(Mapper.java:450
)
        at
com.guavus.mapred.bizreflex.job.BaseJob.Mapper.netflow_mapper(Mapper.java:31
7)
        at
com.guavus.mapred.bizreflex.job.BaseJob.Mapper.map(Mapper.java:387)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(Unknown Source)
        at
com.guavus.mapred.common.collection.ValueCollection.readFields(ValueCollecti
on.java:24)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
eserialize(WritableSerialization.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
eserialize(WritableSerialization.java:40)
        at
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:11
6)
        at
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
        at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
435)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:85
2)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
a:1343)
I also increased the io.sort.factor from default(10) to 500 but still the
error occurs.

Can someone comment what could be the possible reason fir this issue as it
does not occur for lower value of io.sort.mb.





Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V,
Gurgaon,Haryana.
Mobile Number +91-9899949788


Reply via email to