[
https://issues.apache.org/jira/browse/MAHOUT-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi updated MAHOUT-1627:
----------------------------------
Fix Version/s: 0.12.0
> Problem with ALS Factorizer MapReduce version when working with oozie because
> of files in distributed cache. Error: Unable to read sequence file from cache.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1627
> URL: https://issues.apache.org/jira/browse/MAHOUT-1627
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Affects Versions: 0.10.2
> Environment: Hadoop
> Reporter: Srinivasarao Daruna
> Assignee: Suneel Marthi
> Labels: legacy
> Fix For: 0.12.0
>
>
> There is a problem with ALS Factorizer when working with distributed
> environment and oozie.
> Steps:
> 1) Built mahout 1.0 jars and picked mahout-mrlegacy jar.
> 2) I have created a Java class in which i have called
> ParallelALSFactorizationJob with respective inputs.
> 3) Submitted the job and there are list of Map Reduce jobs which got
> submitted to perform the factorization.
> 4) Job failed at MultithreadedSharingMapper with the error Unable to read
> Sequnce file "<ourprogram>.jar" pointing the code at
> org.apache.mahout.cf.taste.hadoop.als.ALS and
> readMatrixByRowsFromDistributedCache method.
> Cause: The ALS class picks up input files which are sequential files from the
> distributed cache using readMatrixByRowsFromDistributedCache method. However,
> when we are working in oozie environment, the program jar as well being
> copied to distributed cache with input files. As the ALS class trying to read
> all the files in distributed cache, it is failing when it encounters jar.
> The remedy would be setting a condition to pick files those are other than
> jars.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)