Hi,

I have searched around but could not find a satisfying answer to this question: 
what is the best way to do a complex transformation on a dataframe column?

For example, I have a dataframe with the following schema and a function that 
has pretty complex logic to format addresses. I would like to use the function 
to format each address and store the output as an additional column in the 
dataframe. What is the best way to do it? Use Dataframe.map? Define a UDF? Some 
code example would be appreciated.

Input dataframe:
root
 |-- ID: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- PhoneNumber: string (nullable = true)
 |-- Address: string (nullable = true)

Output dataframe:
root
 |-- ID: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- PhoneNumber: string (nullable = true)
 |-- Address: string (nullable = true)
 |-- FormattedAddress: string (nullable = true)

The function for format addresses:
def formatAddress(address: String): String


Best regards,
Hao Wang

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to