Collocations: Eliminate in-memory frequency calculation
-------------------------------------------------------

                 Key: MAHOUT-317
                 URL: https://issues.apache.org/jira/browse/MAHOUT-317
             Project: Mahout
          Issue Type: Improvement
    Affects Versions: 0.3
            Reporter: Drew Farris
             Fix For: 0.3


see: 
http://www.lucidimagination.com/search/document/ae484d53e969250e/who_owns_mahout_bucket_on_s3

The collocation code currently uses maps in the CollocCombiner and 
CollocReducer to perform frequency calculations which can cause the process to 
exceed the heap space if a large number of ngrams exist for any given subgram.

Convert the code to use a composite key / secondary sort to avoid the need for 
in-memory map for frequency calculations. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to