[ https://issues.apache.org/jira/browse/SPARK-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386296#comment-14386296 ]
Xusen Yin commented on SPARK-5895: ---------------------------------- Is it possible to select by type or value? Like in Pandas: data = raw_data.select_dtypes(include=[np.float64, np.int64]) Say, I want to select all computable types such as int, double, float, or I want to select all columns that include Nan. > Add VectorSlicer > ---------------- > > Key: SPARK-5895 > URL: https://issues.apache.org/jira/browse/SPARK-5895 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Xiangrui Meng > > `VectorSlicer` takes a vector column and output a vector column with a subset > of features. > {code} > val vs = new VectorSlicer() > .setInputCol("user") > .setSelectedFeatures("age", "salary") > .setOutputCol("usefulUserFeatures") > {code} > We should allow specifying selected features by indices and by names. It > should preserve the output names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org