add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in
RecommenderJob
------------------------------------------------------------------------------------
Key: MAHOUT-473
URL: https://issues.apache.org/jira/browse/MAHOUT-473
Project: Mahout
Issue Type: Improvement
Components: Collaborative Filtering
Affects Versions: 0.4
Reporter: Han Hui Wen
Fix For: 0.4
In RecommenderJob
{code:title=Bar.java|borderStyle=solid}
int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(),
countUsersPath);
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
/* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor
this call to something like
* new DistributedRowMatrix(...).rowSimilarity(...) */
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" +
maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(),
"--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname",
similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir",
tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation
failed", e);
}
}
{code}
We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so
the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.