[
https://issues.apache.org/jira/browse/MAHOUT-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898346#action_12898346
]
Sean Owen commented on MAHOUT-474:
----------------------------------
I don't doubt that compression is a good idea. But it is up to the caller, not
the code. The Hadoop default is to not compress and we follow that. This is how
other jobs work.
But if you are finding a problem in passing arguments, you can identify that
and provide a patch that fixes argument passing.
> Should compress output of Job pairwiseSimilarity and Job asMatrix
> -----------------------------------------------------------------
>
> Key: MAHOUT-474
> URL: https://issues.apache.org/jira/browse/MAHOUT-474
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
>
> !https://issues.apache.org/jira/secure/attachment/12451985/RowSimilarityJob-CooccurrencesMapper-SimilarityReducer.jpg!
> From above picture ,we can see that the output of pairwiseSimilarity is very
> large ,we should compress them.
> SequenceFileOutputFormat.setOutputCompressionType(job, style);
> SequenceFileOutputFormat.setCompressOutput(job, compress);
> SequenceFileOutputFormat.setOutputCompressorClass(job, codecClass)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.