[
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739584#action_12739584
]
Deneche A. Hakim commented on MAHOUT-145:
-----------------------------------------
more tests on my laptop:
KDD 10%
|| Num Map Tasks || Num trees || In-Mem build time || Partial build time ||
In-Mem oob error || Partial oob error ||
| 2 | 10 | 0h 2m 44s 635 | 0h 1m 37s 249 | 3.11E-4 | 0.63 |
| 2 | 100 | 0h 11m 57s 389 | 0h 5m 52s 22 | 2.63E-4 | 0.63 |
| 2 | 200 | 0h 24m 17s 81 | 0h 10m 46s 735 | 2.65E-4 | 0.63 |
| 2 | 400 | 0h 47m 24s 519 | 0h 21m 28s 939 | 2.57E-4 | 0.63 |
| 5 | 10 | 0h 2m 19s 742 | 0h 0m 59s 211 | 4.92E-4 | 0.58 |
| 5 | 100 | 0h 14m 10s 964 | 0h 2m 32s 969 | 2.42E-4 | 0.58 |
| 5 | 200 | 0h 27m 12s 29 | 0h 4m 18s 984 | 2.59E-4 | 0.58 |
| 5 | 400 | 0h 52m 29s 179 | 0h 8m 9s 980 | 2.42E-4 | 0.58 |
| 10 | 10 | 0h 3m 8s 587 | 0h 1m 12s 826 | 5.41E-4 | 0.50 |
| 10 | 100 | 0h 13m 42s 344 | 0h 2m 10s 523 | 2.63E-4 | 0.54 |
| 10 | 200 | 0h 24m 22s 871 | 0h 3m 0s 816 | 2.57E-4 | 0.51 |
| 10 | 400 | 0h 49m 39s 381 | 0h 4m 56s 698 | 2.53E-4 | 0.51 |
| 20 | 10 | | | | |
| 20 | 100 | 0h 15m 20s 24 | 0h 2m 34s 573 | 2.42E-4 | 0.45 |
| 20 | 200 | 0h 29m 43s 385 | 0h 3m 7s 545 | 2.55E-4 | 0.45 |
| 20 | 400 | 0h 50m 43s 957 | 0h 4m 12s 662 | 2.55E-4 | 0.45 |
| 50 | 10 | | | | |
| 50 | 100 | 0h 20m 35s 45 | 0h 3m 52s 244 | 2.46E-4 | 0.43 |
| 50 | 200 | 0h 32m 26s 342 | 0h 4m 24s 853 | 2.48E-4 | 0.43 |
| 50 | 400 | 0h 55m 28s 281 | 0h 5m 5s 999 | 2.51E-4 | 0.43 |
> PartialData mapreduce Random Forests
> ------------------------------------
>
> Key: MAHOUT-145
> URL: https://issues.apache.org/jira/browse/MAHOUT-145
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Deneche A. Hakim
> Priority: Minor
> Attachments: partial_August_2.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions
> of the data. That loses some of the solidity of the original method, but
> could actually do better if the splits exposed non-stationary behavior."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.