[jira] Commented: (MAHOUT-474) Should compress output of Job pairwiseSimilarity and Job asMatrix

Sean Owen (JIRA) Fri, 13 Aug 2010 10:59:42 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898346#action_12898346
 ]


Sean Owen commented on MAHOUT-474:
----------------------------------

I don't doubt that compression is a good idea. But it is up to the caller, not 
the code. The Hadoop default is to not compress and we follow that. This is how 
other jobs work.

But if you are finding a problem in passing arguments, you can identify that 
and provide a patch that fixes argument passing.

> Should compress output of Job pairwiseSimilarity and Job asMatrix
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-474
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-474
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>
> !https://issues.apache.org/jira/secure/attachment/12451985/RowSimilarityJob-CooccurrencesMapper-SimilarityReducer.jpg!
>  From above picture ,we can see that the output of pairwiseSimilarity is very 
> large ,we should compress them.
>       SequenceFileOutputFormat.setOutputCompressionType(job, style);
>       SequenceFileOutputFormat.setCompressOutput(job, compress);
>       SequenceFileOutputFormat.setOutputCompressorClass(job, codecClass)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-474) Should compress output of Job pairwiseSimilarity and Job asMatrix

Reply via email to