[ 
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740876#action_12740876
 ] 

Deneche A. Hakim commented on MAHOUT-145:
-----------------------------------------

Ok here what I did:

* Load KDD 10%
* partition the data among P partitions
* for each partition (p) run the ref. implementation builder, we get a forest 
Fp and a set of predictions Cp
* for each partition (p)
 ** for each forest Fk where k <> p, classify the instances of partition p and 
update Cp
* compute the oob

and launched the test on num trees = 100, and num maps = 2, 10, 50 and got 
almost exactly the same results as Partial Builder...Conclusion there is no 
*visible* bug in the second step of Partial Builder. 

> PartialData mapreduce Random Forests
> ------------------------------------
>
>                 Key: MAHOUT-145
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-145
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Deneche A. Hakim
>            Priority: Minor
>         Attachments: partial_August_2.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions 
> of the data. That loses some of the solidity of the original method, but 
> could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to