[ 
https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722696#comment-15722696
 ] 

Barry Becker commented on SPARK-11215:
--------------------------------------

This would be a good feature. It might be nice to add an optional parameter for 
"maxCategories" like VectorIndexer does. Any column found to have more than 
maxCategories would then be skipped. This would have the advantage of avoiding 
the work of indexing columns with huge numbers of distinct values.

> Add multiple columns support to StringIndexer
> ---------------------------------------------
>
>                 Key: SPARK-11215
>                 URL: https://issues.apache.org/jira/browse/SPARK-11215
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yanbo Liang
>            Assignee: Yanbo Liang
>
> Add multiple columns support to StringIndexer, then users can transform 
> multiple input columns to multiple output columns simultaneously. See 
> discussion SPARK-8418.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to