Thanks Davies. HiveContext seems neat to use :)
On Thu, Aug 20, 2015 at 3:02 PM, Davies Liu dav...@databricks.com wrote:
As Aram said, there two options in Spark 1.4,
1) Use the HiveContext, then you got datediff from Hive,
df.selectExpr(datediff(d2, d1))
2) Use Python UDF:
```
from
Apologies, sent too early accidentally. Actual message is below
A dataframe has 2 datecolumns (datetime type) and I would like to add
another column that would have difference between these two dates.
Dataframe snippet is below.
new_df.withColumn('SVCDATE2',
(new_df.next_diag_date-new_df.SVCDATE).days).show()
+---+--+--+ | PATID| SVCDATE|next_diag_date|
+---+--+--+ |12345655545|2012-02-13|
2012-02-13| |12345655545|2012-02-13| 2012-02-13| |12345655545|2012-02-13|
As Aram said, there two options in Spark 1.4,
1) Use the HiveContext, then you got datediff from Hive,
df.selectExpr(datediff(d2, d1))
2) Use Python UDF:
```
from datetime import date
df = sqlContext.createDataFrame([(date(2008, 8, 18), date(2008, 9, 26))],
['d1', 'd2'])
from
More update on this question..I am using spark 1.4.1.
I was just reading documentation of spark 1.5 (still in development) and I
think there will be a new func *datediff* that will solve the issue. So
please let me know if there is any work-around until spark 1.5 is out :).
Hi,
hope this will help you
import org.apache.spark.sql.functions._
import sqlContext.implicits._
import java.sql.Timestamp
val df = sc.parallelize(Array((date1, date2))).toDF(day1, day2)
val dateDiff = udf[Long, Timestamp, Timestamp]((value1, value2) =