Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-26 Thread Dhaval Patel
Thanks Davies. HiveContext seems neat to use :) On Thu, Aug 20, 2015 at 3:02 PM, Davies Liu dav...@databricks.com wrote: As Aram said, there two options in Spark 1.4, 1) Use the HiveContext, then you got datediff from Hive, df.selectExpr(datediff(d2, d1)) 2) Use Python UDF: ``` from

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Dhaval Patel
Apologies, sent too early accidentally. Actual message is below A dataframe has 2 datecolumns (datetime type) and I would like to add another column that would have difference between these two dates. Dataframe snippet is below.

How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Dhaval Patel
new_df.withColumn('SVCDATE2', (new_df.next_diag_date-new_df.SVCDATE).days).show() +---+--+--+ | PATID| SVCDATE|next_diag_date| +---+--+--+ |12345655545|2012-02-13| 2012-02-13| |12345655545|2012-02-13| 2012-02-13| |12345655545|2012-02-13|

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Davies Liu
As Aram said, there two options in Spark 1.4, 1) Use the HiveContext, then you got datediff from Hive, df.selectExpr(datediff(d2, d1)) 2) Use Python UDF: ``` from datetime import date df = sqlContext.createDataFrame([(date(2008, 8, 18), date(2008, 9, 26))], ['d1', 'd2']) from

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Dhaval Patel
More update on this question..I am using spark 1.4.1. I was just reading documentation of spark 1.5 (still in development) and I think there will be a new func *datediff* that will solve the issue. So please let me know if there is any work-around until spark 1.5 is out :).

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Aram Mkrtchyan
Hi, hope this will help you import org.apache.spark.sql.functions._ import sqlContext.implicits._ import java.sql.Timestamp val df = sc.parallelize(Array((date1, date2))).toDF(day1, day2) val dateDiff = udf[Long, Timestamp, Timestamp]((value1, value2) =