[
https://issues.apache.org/jira/browse/SPARK-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath resolved SPARK-22801.
------------------------------------
Resolution: Fixed
Fix Version/s: 2.3.0
Issue resolved by pull request 19991
[https://github.com/apache/spark/pull/19991]
> Allow FeatureHasher to specify numeric columns to treat as categorical
> ----------------------------------------------------------------------
>
> Key: SPARK-22801
> URL: https://issues.apache.org/jira/browse/SPARK-22801
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.3.0
> Reporter: Nick Pentreath
> Assignee: Nick Pentreath
> Fix For: 2.3.0
>
>
> {{FeatureHasher}} added in SPARK-13964 always treats numeric type columns as
> numbers and never as categorical features. It is quite common to have
> categorical features represented as numbers or codes (often say {{Int}}) in
> data sources.
> In order to hash these features as categorical, users must first explicitly
> convert them to strings which is cumbersome.
> Add a new param {{categoricalCols}} which specifies the numeric columns that
> should be treated as categorical features.
> *Note* while the reverse case is certainly possible (i.e. numeric features
> that are encoded as strings and a user would like to treat them as numeric),
> this is probably less likely and this case won't be supported at this time.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]