[
https://issues.apache.org/jira/browse/SPARK-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336132#comment-15336132
]
Taylor Baldwin commented on SPARK-15995:
----------------------------------------
Will be closing this issue. Found everything we need in Boosting Strategy.
Was unaware of separate contracts for Gradient Boost Trees and Random Forest /
Decision Trees.
> Gradient Boosted Trees - handling of Categorical Inputs
> -------------------------------------------------------
>
> Key: SPARK-15995
> URL: https://issues.apache.org/jira/browse/SPARK-15995
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.6.1
> Reporter: Taylor Baldwin
>
> Gradient Boosted trees appear to handle all inputs as continuous, or at least
> ordered, values. The trees returned in the Gradient Boosted model have nodes
> for categorical values containing a split that operates on the threshold not
> the categories value. This treats categorical values as if the ordering of
> the values is significant, which is not reasonable to assume.
> Both Random Forest and Decision Trees accept the map for categorical features
> info, while Gradient Boosted trees do not. Random Forest and Decision trees
> provide nodes for categorical values that have split with the categories
> populated.
> According to the website documentation, Gradient Boosted trees should handle
> categorical features yet there is no perceivable way to provide the
> categorical information to enable handling them as categories not continuous
> values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]