[ 
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741495#action_12741495
 ] 

Ted Dunning commented on MAHOUT-145:
------------------------------------


These are confusing numbers.  First, why does the number of trees vary like 
this?

Secondly, the oob error jumps around a lot in confusing ways.

Thirdly, the times don't seem to match what I would expect.  Moreover, KDD10 at 
10 and 50 map tasks take exactly the same amount of time.

My expectation would have been that running 20 map tasks would do almost twice 
as well as running 10 because we have 10 machines each of which is dual core.  
Running 50 map tasks should be about the same as 20.  We see that pattern on 
KDD25 except we don't have a datapoint for 50 maps.

> PartialData mapreduce Random Forests
> ------------------------------------
>
>                 Key: MAHOUT-145
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-145
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Deneche A. Hakim
>            Priority: Minor
>         Attachments: partial_August_10.patch, partial_August_2.patch, 
> partial_August_9.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions 
> of the data. That loses some of the solidity of the original method, but 
> could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to