[
https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720847#action_12720847
]
Ted Dunning commented on MAHOUT-122:
------------------------------------
Just for grins, I plotted the max memory usage versus number of instances. The
relationship (so far) is very much a straight line. The fit that R gives me is
450 (se=25) bytes per data instance with a fixed overhead of 23MB (se=14MB).
The fixed overhead can't necessarily be distinguished from zero, but it looks
right.
The 450 byte overhead per training instance seems a little bit high, but I
don't know the data well so it might be pretty reasonable. The original data
size was about 100 bytes.
> Random Forests Reference Implementation
> ---------------------------------------
>
> Key: MAHOUT-122
> URL: https://issues.apache.org/jira/browse/MAHOUT-122
> Project: Mahout
> Issue Type: Task
> Components: Classification
> Affects Versions: 0.2
> Reporter: Deneche A. Hakim
> Attachments: 2w_patch.diff, 3w_patch.diff, RF reference.patch
>
> Original Estimate: 25h
> Remaining Estimate: 25h
>
> This is the first step of my GSOC project. Implement a simple, easy to
> understand, reference implementation of Random Forests (Building and
> Classification). The only requirement here is that "it works"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.