Hosur Narahari created SPARK-22021:
--------------------------------------
Summary: Add a feature transformation to accept a function and
apply it on all rows of dataframe
Key: SPARK-22021
URL: https://issues.apache.org/jira/browse/SPARK-22021
Project: Spark
Issue Type: New Feature
Components: ML
Affects Versions: 2.3.0
Reporter: Hosur Narahari
More often we generate derived features in ML pipeline by doing some
mathematical or other kind of operation on columns of dataframe like getting a
total of few columns as a new column or if there is text field message and we
want the length of message etc. We currently don't have an efficient way to
handle such scenario in ML pipeline.
By Providing a transformer which accepts a function and performs that on
mentioned columns to generate output column of numerical type, user has the
flexibility to derive features by applying any domain specific logic.
Example:
val function = "function(a,b) { return a+b;}"
val transformer = new GenFuncTransformer().setInputCols(Array("v1",
"v2")).setOutputCol("result").setFunction(function)
val df = Seq((1.0, 2.0), (3.0, 4.0)).toDF("v1", "v2")
val result = transformer.transform(df)
result.show
v1 v2 result
1.0 2.0 3.0
3.0 4.0 7.0
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]