[ 
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118267#comment-13118267
 ] 

Alvin AuYoung commented on MAHOUT-542:
--------------------------------------

Hi Sebastian,

First of all, many thanks for contributing this ALS implementation. It's very 
useful. Like others on this list, I'm trying to run some experiments on it 
using the Netflix data, but I'm seeing an error I am having trouble diagnosing. 
After completing the first 4 jobs, reduce copiers are failing for the 5th job 
(Mapper-SolvingReducer). I'm running on hadoop-0.20.2 and checked out mahout 
from the trunk, so I believe any patches you've mentioned should be 
incorporated. 

Here is a description of the job I'm running: MAHOUT-JOB: 
/home/auyoung/mahout/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
11/09/30 01:20:56 INFO common.AbstractJob: Command line arguments: 
{--endPhase=2147483647, --input=training_all_triplets_norm, --lambda=0.065, 
--numFeatures=25, --numIterations=5, --output=als.out, --startPhase=0, 
--tempDir=temp}

Do you have any ideas what might be wrong? I'm running it on a physical cluster 
of 20 slaves, each with 2 mappers and reducers, and there is > 8 GB memory (per 
jvm), > 2 GB HADOOP_HEAPSIZE, and the maximum allowable io.sort.mb of 2047. 
Also, there is plenty of disk space remaining. Here is a transcript of one of 
the several failures on the ParallelALSFactorizationJob-Mapper-SolvingReducer:

2011-09-30 02:05:37,115 INFO org.apache.hadoop.mapred.Merger: Merging 16 sorted 
segments
2011-09-30 02:05:37,115 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 16 segments left of total size: 1039493457 bytes
2011-09-30 02:05:37,116 WARN org.apache.hadoop.mapred.ReduceTask: 
attempt_201109300120_0005_r_000000_0 Merge of the inmemory files threw an 
exception: java.io.IOException: Intermediate merge failed
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
Caused by: java.lang.RuntimeException: java.io.EOFException
        at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
        at 
org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
        at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2560)
        ... 1 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
        at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
        at 
org.apache.mahout.cf.taste.hadoop.als.IndexedVarIntWritable.readFields(IndexedVarIntWritable.java:64)
        at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
        ... 8 more

2011-09-30 02:05:37,116 WARN org.apache.hadoop.mapred.ReduceTask: 
attempt_201109300120_0005_r_000000_0 Merging of the local FS files threw an 
exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException
        at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
        at 
org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
        at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2454)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
        at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
        at 
org.apache.mahout.cf.taste.hadoop.als.IndexedVarIntWritable.readFields(IndexedVarIntWritable.java:64)
        at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:100)
        ... 7 more

        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2458)

Thanks,

Alvin

                
> MapReduce implementation of ALS-WR
> ----------------------------------
>
>                 Key: MAHOUT-542
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.5
>
>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, 
> MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch, 
> MAHOUT-542-6.patch, logs.zip
>
>
> As Mahout is currently lacking a distributed collaborative filtering 
> algorithm that uses matrix factorization, I spent some time reading through a 
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel 
> Collaborative Filtering for the Netflix Prize" available at 
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with 
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some 
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using 
> Map/Reduce, so I sat down and created a prototype version. I'm not really 
> sure I got the mathematical details correct (they need some optimization 
> anyway), but I wanna put up my prototype implementation here per Yonik's law 
> of patches.
> Maybe someone has the time and motivation to work a little on this with me. 
> It would be great if someone could validate the approach taken (I'm willing 
> to help as the code might not be intuitive to read) and could try to 
> factorize some test data and give feedback then.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to