[ 
https://issues.apache.org/jira/browse/SPARK-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22801:
------------------------------------

    Assignee: Apache Spark  (was: Nick Pentreath)

> Allow FeatureHasher to specify numeric columns to treat as categorical
> ----------------------------------------------------------------------
>
>                 Key: SPARK-22801
>                 URL: https://issues.apache.org/jira/browse/SPARK-22801
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Nick Pentreath
>            Assignee: Apache Spark
>
> {{FeatureHasher}} added in SPARK-13964 always treats numeric type columns as 
> numbers and never as categorical features. It is quite common to have 
> categorical features represented as numbers or codes (often say {{Int}}) in 
> data sources. 
> In order to hash these features as categorical, users must first explicitly 
> convert them to strings which is cumbersome. 
> Add a new param {{categoricalCols}} which specifies the numeric columns that 
> should be treated as categorical features.
> *Note* while the reverse case is certainly possible (i.e. numeric features 
> that are encoded as strings and a user would like to treat them as numeric), 
> this is probably less likely and this case won't be supported at this time. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to