[
https://issues.apache.org/jira/browse/SPARK-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738954#comment-16738954
]
Hyukjin Kwon commented on SPARK-26579:
--------------------------------------
Let's ask question to mailing list rather then filing a JIRA here. You could
have a better answer there.
> SparkML DecisionTree, how does the algorithm identify categorical features?
> ---------------------------------------------------------------------------
>
> Key: SPARK-26579
> URL: https://issues.apache.org/jira/browse/SPARK-26579
> Project: Spark
> Issue Type: Question
> Components: ML
> Affects Versions: 2.4.0
> Environment: os: Centos7
> software: pyspark.
> Reporter: Xufeng Wang
> Priority: Major
>
> I am confused about the decision tree and other tree based models. My current
> project involves data with both nominal and continuous features. I have
> converted the nominal data to continuous values using the StringIndexer
> transformer from the ml.feature module. Then I vector assembled all the
> feature values into a vector type column named features. The feature vector,
> as I see it, are all double datatype.
> While I keep getting the maxBins should be larger than the largest number for
> all categorical features error, as I correct the maxBins size, I still see
> some features (continuous type since the beginning) having the bigger than my
> maxBins size values. Since the pipeline works with correct maxBins that is
> not bigger than some continuous values, I should be able to say that the
> algorithm automatically pick which features are categorical and which ones
> are continuous. But how did it figure out which is which, as all of the
> features are of double datatype?
> Another question, if anyone can help, what is the tree type for spark
> decision tree. Is it CART or else?
> Last question, what are the procedures for treating categorical features in
> tree based algorithms.
> Thank you in advance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]