[
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741495#action_12741495
]
Ted Dunning commented on MAHOUT-145:
------------------------------------
These are confusing numbers. First, why does the number of trees vary like
this?
Secondly, the oob error jumps around a lot in confusing ways.
Thirdly, the times don't seem to match what I would expect. Moreover, KDD10 at
10 and 50 map tasks take exactly the same amount of time.
My expectation would have been that running 20 map tasks would do almost twice
as well as running 10 because we have 10 machines each of which is dual core.
Running 50 map tasks should be about the same as 20. We see that pattern on
KDD25 except we don't have a datapoint for 50 maps.
> PartialData mapreduce Random Forests
> ------------------------------------
>
> Key: MAHOUT-145
> URL: https://issues.apache.org/jira/browse/MAHOUT-145
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Deneche A. Hakim
> Priority: Minor
> Attachments: partial_August_10.patch, partial_August_2.patch,
> partial_August_9.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions
> of the data. That loses some of the solidity of the original method, but
> could actually do better if the splits exposed non-stationary behavior."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.