RowSimilarityJob hangs during CooccurrencesMapper
-------------------------------------------------

                 Key: MAHOUT-577
                 URL: https://issues.apache.org/jira/browse/MAHOUT-577
             Project: Mahout
          Issue Type: Bug
          Components: Collaborative Filtering
    Affects Versions: 0.4
         Environment: Linux Debian 5.0.5, 12GB Ram, Hadoop 20.3 installation 
            Reporter: Maya Hristakeva
            Priority: Blocker


Hello,

When trying to run a RowSimilarityJob on a matrix ( 146682 x 138351 ), the job 
gets through the RowWeightMapper and WeightedOccurrencesPerColumnReducer, and 
hangs during the CooccurrencesMapper although it shows that the map tasks are 
100% complete. 

The command I use to run the job is: 

hadoop jar mahout-core-0.4-job.jar 
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob 
-Dmapred.input.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaCompressedDocumentsMatrix
 
-Dmapred.output.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaDocumentSimilarityMatrix
 -Dmapred.reduce.tasks=8 -Dmapred.map.tasks=200 
-Dmapred.job.name=LDA_ROW_SIMILARITY_TEST --tempDir 
/user/maya.hristakeva/temp/lda/5 --numberOfColumns 138351 --similarityClassname 
org.apache.mahout.math.hadoop.similarity.vector.DistributedEuclideanDistanceVectorSimilarity
 --maxSimilaritiesPerRow 10

And the output of the mappers which are 100% complete, but hanging is: 

syslog logs

01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: bufstart = 29085149; 
bufend = 39038598; bufvoid = 99614720
2011-01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65461; 
kvend = 327605; length = 327680
2011-01-05 18:30:06,241 INFO org.apache.hadoop.mapred.MapTask: Finished spill 94
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: record full = true
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
39038598; bufend = 48983989; bufvoid = 99614720
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
327605; kvend = 262068; length = 327680
2011-01-05 18:30:14,528 INFO org.apache.hadoop.mapred.MapTask: Finished spill 95
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: record full = true
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
48983989; bufend = 58929384; bufvoid = 99614720
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
262068; kvend = 196531; length = 327680
2011-01-05 18:30:22,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 96
.
.
.

This problem does not occur when I use a toy matrix of 100 x 100, but once I 
give it the original matrix of ..... the problem is always reproducible. 

Any ideas on what could be causing this? 

Thanks, 
Maya Hristakeva




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to