Jakub created MAHOUT-1226:
-----------------------------

             Summary: mahout ssvd Bt-job bug
                 Key: MAHOUT-1226
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1226
             Project: Mahout
          Issue Type: Bug
    Affects Versions: 0.7
         Environment: mahout-0.7
hadoop-0.20.205.0
            Reporter: Jakub


when using mahout ssvd job, Bt-job creates lots of spills to disk.
Those might be minimized by tuning hadoop io.sort.mb parameter.
However, when io.sort.mb is bigger than ~ 1100 , ie. 1500 I'm getting that 
exception:

java.io.IOException: Spill failed
    at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029)
    at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
    at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper$1.collect(BtJob.java:261)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper$1.collect(BtJob.java:255)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockAccumulator.flushBlock(SparseRowBlockAccumulator.java:65)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockAccumulator.collect(SparseRowBlockAccumulator.java:75)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.map(BtJob.java:158)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.map(BtJob.java:102)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: next value iterator failed
    at 
org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:166)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$OuterProductCombiner.reduce(BtJob.java:322)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$OuterProductCombiner.reduce(BtJob.java:302)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1502)
    at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
    at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
    at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readByte(DataInputStream.java:267)
    at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
    at 
org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockWritable.readFields(SparseRowBlockWritable.java:60)
    at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
    at 
org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
    ... 7 more

by changing this value I've already managed to reduce spills from 100 (for 
default io.sort.mb value) to 10, disk usage dropped from around 7 gigabytes for 
my small data set to around 900 mb. repairing this issue might bring big 
performance improvements.

I've got lots of free ram, that's not some lack of memory issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to