Re: DataFrame -- help with encoding factor variables

2015-04-06 Thread Xiangrui Meng
Before OneHotEncoder or LabelIndexer is merged, you can define an UDF to do the mapping. val labelToIndex = udf { ... } featureDF.withColumn(f3_dummy, labelToIndex(col(f3))) See instructions here

DataFrame -- help with encoding factor variables

2015-04-06 Thread Yana Kadiyska
Hi folks, currently have a DF that has a factor variable -- say gender. I am hoping to use the RandomForest algorithm on this data an it appears that this needs to be converted to RDD[LabeledPoint] first -- i.e. all features need to be double-encoded. I see