[ 
https://issues.apache.org/jira/browse/MAHOUT-474?focusedWorklogId=991416&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-991416
 ]

ASF GitHub Bot logged work on MAHOUT-474:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Nov/25 02:22
            Start Date: 13/Nov/25 02:22
    Worklog Time Spent: 10m 
      Work Description: guan404ming commented on PR #620:
URL: https://github.com/apache/mahout/pull/620#issuecomment-3524853540

   Hi @rich7420 @andrewmusselman could you help take a look at this, thanks!




Issue Time Tracking
-------------------

    Worklog Id:     (was: 991416)
    Time Spent: 20m  (was: 10m)

> Should compress output of Job pairwiseSimilarity and Job asMatrix
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-474
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-474
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean R. Owen
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> !https://issues.apache.org/jira/secure/attachment/12451985/RowSimilarityJob-CooccurrencesMapper-SimilarityReducer.jpg!
>  From above picture ,we can see that the output of pairwiseSimilarity is very 
> large ,we should compress them.
>       SequenceFileOutputFormat.setOutputCompressionType(job, style);
>       SequenceFileOutputFormat.setCompressOutput(job, compress);
>       SequenceFileOutputFormat.setOutputCompressorClass(job, codecClass)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to