Srinivasarao Daruna created MAHOUT-1627:
-------------------------------------------

             Summary: Problem with ALS Factorizer MapReduce version when 
working with oozie because of files in distributed cache. Error: Unable to read 
sequence file from cache.
                 Key: MAHOUT-1627
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1627
             Project: Mahout
          Issue Type: Bug
          Components: Collaborative Filtering
    Affects Versions: 1.0
         Environment: Hadoop
            Reporter: Srinivasarao Daruna


There is a problem with ALS Factorizer when working with distributed 
environment and oozie.

Steps:

1) Built mahout 1.0 jars and picked mahout-mrlegacy jar.

2) I have created a Java class in which i have called 
ParallelALSFactorizationJob with respective inputs.

3) Submitted the job and there are list of Map Reduce jobs which got submitted 
to perform the factorization.

4) Job failed at MultithreadedSharingMapper with the error Unable to read 
Sequnce file "<ourprogram>.jar" pointing the code at 
org.apache.mahout.cf.taste.hadoop.als.ALS and 
readMatrixByRowsFromDistributedCache method.

Cause: The ALS class picks up input files which are sequential files from the 
distributed cache using readMatrixByRowsFromDistributedCache method. However, 
when we are working in oozie environment, the program jar as well being copied 
to distributed cache with input files. As the ALS class trying to read all the 
files in distributed cache, it is failing when it encounters jar. 

The remedy would be setting a condition to pick files those are other than 
jars. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to