Srinivasarao Daruna created MAHOUT-1627:
-------------------------------------------
Summary: Problem with ALS Factorizer MapReduce version when
working with oozie because of files in distributed cache. Error: Unable to read
sequence file from cache.
Key: MAHOUT-1627
URL: https://issues.apache.org/jira/browse/MAHOUT-1627
Project: Mahout
Issue Type: Bug
Components: Collaborative Filtering
Affects Versions: 1.0
Environment: Hadoop
Reporter: Srinivasarao Daruna
There is a problem with ALS Factorizer when working with distributed
environment and oozie.
Steps:
1) Built mahout 1.0 jars and picked mahout-mrlegacy jar.
2) I have created a Java class in which i have called
ParallelALSFactorizationJob with respective inputs.
3) Submitted the job and there are list of Map Reduce jobs which got submitted
to perform the factorization.
4) Job failed at MultithreadedSharingMapper with the error Unable to read
Sequnce file "<ourprogram>.jar" pointing the code at
org.apache.mahout.cf.taste.hadoop.als.ALS and
readMatrixByRowsFromDistributedCache method.
Cause: The ALS class picks up input files which are sequential files from the
distributed cache using readMatrixByRowsFromDistributedCache method. However,
when we are working in oozie environment, the program jar as well being copied
to distributed cache with input files. As the ALS class trying to read all the
files in distributed cache, it is failing when it encounters jar.
The remedy would be setting a condition to pick files those are other than
jars.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)