[
https://issues.apache.org/jira/browse/SPARK-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292467#comment-16292467
]
Apache Spark commented on SPARK-22801:
--------------------------------------
User 'MLnick' has created a pull request for this issue:
https://github.com/apache/spark/pull/19991
> Allow FeatureHasher to specify numeric columns to treat as categorical
> ----------------------------------------------------------------------
>
> Key: SPARK-22801
> URL: https://issues.apache.org/jira/browse/SPARK-22801
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.3.0
> Reporter: Nick Pentreath
> Assignee: Nick Pentreath
>
> {{FeatureHasher}} added in SPARK-13964 always treats numeric type columns as
> numbers and never as categorical features. It is quite common to have
> categorical features represented as numbers or codes (often say {{Int}}) in
> data sources.
> In order to hash these features as categorical, users must first explicitly
> convert them to strings which is cumbersome.
> Add a new param {{categoricalCols}} which specifies the numeric columns that
> should be treated as categorical features.
> *Note* while the reverse case is certainly possible (i.e. numeric features
> that are encoded as strings and a user would like to treat them as numeric),
> this is probably less likely and this case won't be supported at this time.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]