Hi, You can use the Column functions provided by Spark API
https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html Hope this helps . Thanks, Divya On 17 November 2016 at 12:08, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > Hi, > I have a sample, like: > +---+------+--------------------+ > |age|gender| city_id| > +---+------+--------------------+ > | | 1|1042015:city_2044...| > |90s| 2|1042015:city_2035...| > |80s| 2|1042015:city_2061...| > +---+------+--------------------+ > > and expectation is: > "age": 90s -> 90, 80s -> 80 > "gender": 1 -> "male", 2 -> "female" > > I have two solutions: > 1. Handle each column separately, and then join all by index. > val age = input.select("age").map(...) > val gender = input.select("gender").map(...) > val result = ... > > 2. Write utf function for each column, and then use in together: > val result = input.select(ageUDF($"age"), genderUDF($"gender")) > > However, both are awkward, > > Does anyone have a better work flow? > Write some custom Transforms and use pipeline? > > Thanks. > > > >