OK. I've worked it out. df.withColumn('diff', col('A')-col('B'))
On Sun, May 7, 2017 at 11:49 AM, Zeming Yu <zemin...@gmail.com> wrote: > Say I have the following dataframe with two numeric columns A and B, > what's the best way to add a column showing the difference between the two > columns? > > +-----------------+----------+ > | A| B| > +-----------------+----------+ > |786.3199999999999| 786.12| > | 786.12| 786.12| > | 786.42| 786.12| > | 786.72| 786.12| > | 786.92| 786.12| > | 786.92| 786.12| > | 786.72| 786.12| > | 786.72| 786.12| > | 827.72| 786.02| > | 827.72| 786.02| > +-----------------+----------+ > > > I could probably figure out how to do this vis UDF, but is UDF generally > slower? > > > Thanks! > >