[
https://issues.apache.org/jira/browse/MAHOUT-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ikumasa Mukai updated MAHOUT-943:
---------------------------------
Attachment: MAHOUT-943.patch
I made a patch.
Following Deneche-san's advice, I added a mechanism to change the config of
TreeBuilder with xml.
{noformat}
<?xml version="1.0"?>
<configuration>
<treeBuilder
class="org.apache.mahout.classifier.df.builder.DecisionTreeBuilder">
<igSplit class="org.apache.mahout.classifier.df.split.ClassificationSplit"/>
<m>5</m>
</treeBuilder>
</configuration>
{noformat}
ClassificationSplit class is a sample splitter which uses the average value for
the point.
{noformat}
./hadoop jar $MAHOUT_HOME/mahout-examples-0.6-SNAPSHOT-job.jar \
org.apache.mahout.classifier.df.mapreduce.BuildForest \
-Dmapred.max.split.size=1874231 \
-d $KDD_DATA/KDDTrain.data \
-ds $KDD_DATA/KDDTrain+.info \
-c $MAHOUT_HOME/conf/df-config.xml \
-p -t 100 -o $KDD_DATA/model
{noformat}
I added "-c" param on BuildForest. This param should pointto the conf(XML) file.
> Improbe the way to make the split point on DF.
> ----------------------------------------------
>
> Key: MAHOUT-943
> URL: https://issues.apache.org/jira/browse/MAHOUT-943
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Ikumasa Mukai
> Labels: DecisionForest
> Attachments: MAHOUT-943.patch
>
>
> The numericalSplit() on OptIgSplit adopts the way to regard the attribute
> value having the best IG as the split point.
> But I think this is a little too strict and think it is better on some
> situation to use the average value which is calced with the best IG value
> and the 2nd value.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira