[
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-157:
------------------------------
Attachment: MAHOUT-157-CompactTransactionMapperFormat.patch
Tried running 1.6GB dense transaction dataset(webdocs) on a single node
cluster. Mapper seem to be creating huge groups of transactions. So converted
all the transactions to integers at the mapper stage.
The size of mapper output is too large for one node to handle. Seems it would
need atleast a 5-10Node cluster to test the above dataset
> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
> Key: MAHOUT-157
> URL: https://issues.apache.org/jira/browse/MAHOUT-157
> Project: Mahout
> Issue Type: New Feature
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.2
> Reporter: Robin Anil
> Assignee: Robin Anil
> Fix For: 0.2
>
> Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch,
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch,
> MAHOUT-157-codecleanup-javadocs.patch,
> MAHOUT-157-Combinations-BSD-License.patch,
> MAHOUT-157-Combinations-BSD-License.patch,
> MAHOUT-157-CompactTransactionMapperFormat.patch, MAHOUT-157-final.patch,
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch,
> MAHOUT-157-Oct-10.pfpgrowth.patch, MAHOUT-157-Oct-8.pfpgrowth.patch,
> MAHOUT-157-Oct-8.TestedMapReducePipeline.patch,
> MAHOUT-157-Oct-9.StreamingDBRead-Inprogress.patch,
> MAHOUT-157-September-10.patch, MAHOUT-157-September-18.patch,
> MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.