[ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-157:
------------------------------

    Attachment: MAHOUT-157-CompactTransactionMapperFormat.patch

Tried running 1.6GB dense transaction dataset(webdocs) on a single node 
cluster. Mapper seem to be creating huge groups of transactions. So converted 
all the transactions to integers at the mapper stage.

The size of mapper output is too large for one node to handle. Seems it would 
need atleast a  5-10Node cluster to test the above dataset

> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
> MAHOUT-157-codecleanup-javadocs.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-CompactTransactionMapperFormat.patch, MAHOUT-157-final.patch, 
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch, 
> MAHOUT-157-Oct-10.pfpgrowth.patch, MAHOUT-157-Oct-8.pfpgrowth.patch, 
> MAHOUT-157-Oct-8.TestedMapReducePipeline.patch, 
> MAHOUT-157-Oct-9.StreamingDBRead-Inprogress.patch, 
> MAHOUT-157-September-10.patch, MAHOUT-157-September-18.patch, 
> MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to