[
https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718777#action_12718777
]
Deneche A. Hakim commented on MAHOUT-122:
-----------------------------------------
I did some tests on some of the datasets used in Breiman's paper to compare the
results of the reference implementation.
The test procedure described in Breiman's paper is as follows :
* 10% of the dataset is kept apart as a testing set
* for each dataset we build two forests, one with m=int(log2(M)+1) (called
Random-Input) and one with m=1 (called Single-Input)
* we used the forest that gave the lowest oob error estimation to compute the
test set error
* we compute the test set error using the Single Input Forest, to show that
even when m=1 Random Forests give comparable results to greater values of m
* we compute the mean test set error using every tree of the forest that gave
the lowest oob error. This shows how a single Decision Tree performs
In the following tables:
*Selection: test set result with the forest that gave the lowest oob error
*Single Input: test set result with the Single-Input forest
* One Tree: Mean Tree test set error
*Breiman's results :*
|| Data || Selection || Single Input || One Tree ||
| glass | 20.6 | 21.2 | 36.9 |
| breast cancer | 2.9 | 2.7 | 6.3 |
| diabetes | 24.2 | 24.3 | 33.1 |
| sonar | 15.9 | 18.0 | 31.7 |
| ionosphere | 7.1 | 7.5 | 12.7 |
| vehicle | 25.8 | 26.4 | 33.1 |
| german | 24.4 | 26.2 | 33.3 |
*Reference Implementation results :*
I also included the how much system time (mean) each forest (Random-Input or
Single-Input) took to build
|| Data || Selection || Single Input || One Tree || Mean RI Time || Mean SI
Time ||
| glass | 24.8 | 23.9 | 41.2 | 9s 19 | 2s 667 |
| breast cancer | 2.8 | 2.7 | 5.8 | 2s 588 | 1s 60 |
| diabetes | 24.5 | 24.6 | 32.1 | 34s 875 | 10s 284 |
| sonar | 14.6 | 15.3 | 32.3 | 10s 89 | 2s 227 |
| ionosphere | 7.1 | 7.0 | 15.5 | 33s 190 | 6s 96 |
| vehicle | 25.3 | 26.4 | 33.7 | 42s 194 | 10s 21 |
| german | 23.15 | 25.27 | 32.8 | 10s 203 | 3s 654 |
> Random Forests Reference Implementation
> ---------------------------------------
>
> Key: MAHOUT-122
> URL: https://issues.apache.org/jira/browse/MAHOUT-122
> Project: Mahout
> Issue Type: Task
> Components: Classification
> Affects Versions: 0.2
> Reporter: Deneche A. Hakim
> Attachments: 2w_patch.diff, RF reference.patch
>
> Original Estimate: 25h
> Remaining Estimate: 25h
>
> This is the first step of my GSOC project. Implement a simple, easy to
> understand, reference implementation of Random Forests (Building and
> Classification). The only requirement here is that "it works"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.