[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722696#comment-15722696 ]
Barry Becker commented on SPARK-11215: -------------------------------------- This would be a good feature. It might be nice to add an optional parameter for "maxCategories" like VectorIndexer does. Any column found to have more than maxCategories would then be skipped. This would have the advantage of avoiding the work of indexing columns with huge numbers of distinct values. > Add multiple columns support to StringIndexer > --------------------------------------------- > > Key: SPARK-11215 > URL: https://issues.apache.org/jira/browse/SPARK-11215 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Yanbo Liang > Assignee: Yanbo Liang > > Add multiple columns support to StringIndexer, then users can transform > multiple input columns to multiple output columns simultaneously. See > discussion SPARK-8418. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org