HI , If you need a data frame specific solution , you can try the below
df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu <yuzhih...@gmail.com> wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+ > when+using+Joda+DateTime > > On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott <kevin.r.mell...@gmail.com> > wrote: > >> I've had pretty good success using Joda-Time >> <http://www.joda.org/joda-time/index.html> for date/time manipulations >> within Spark applications. You may be able to use the *DateTIme* constructor >> below, if you are starting with milliseconds. >> >> DateTime >> >> public DateTime(long instant) >> >> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z >> using ISOChronology in the default time zone. >> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z >> >> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson < >> a...@santacruzintegration.com> wrote: >> >>> What little I know about working with timestamps is based on >>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html >>> >>> Using the example of dates formatted into human friend strings -> >>> timeStamps I was able to figure out how to convert Epoch times to >>> timestamps. The same trick did not work for millisecond times. >>> >>> Any suggestions would be greatly appreciated. >>> >>> >>> Andy >>> >>> Working with epoch times >>> <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times> >>> >>> ref: http://www.epochconverter.com/ >>> >>> Epoch timestamp: 1456050620 >>> >>> Timestamp in milliseconds: 1456050620000 >>> >>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT >>> >>> Human time (your time zone): 2/21/2016, 2:30:20 AM >>> >>> >>> # Epoch time stamp example >>> >>> data = [ >>> >>> ("1456050620", "1456050621", 1), >>> >>> ("1456050622", "14560506203", 2), >>> >>> ("14560506204", "14560506205", 3)] >>> >>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"]) >>> >>> >>> >>> # convert epoch time strings in to spark timestamps >>> >>> df = df.select( >>> >>> df.start_time.cast("long").alias("start_time"), >>> >>> df.end_time.cast("long").alias("end_time"), >>> >>> df.id) >>> >>> df.printSchema() >>> >>> df.show(truncate=False) >>> >>> >>> >>> # convert longs to timestamps >>> >>> df = df.select( >>> >>> df.start_time.cast("timestamp").alias("start_time"), >>> >>> df.end_time.cast("timestamp").alias("end_time"), >>> >>> df.id) >>> >>> df.printSchema() >>> >>> df.show(truncate=False) >>> >>> >>> >>> root >>> |-- start_time: long (nullable = true) >>> |-- end_time: long (nullable = true) >>> |-- id: long (nullable = true) >>> >>> +-----------+-----------+---+ >>> |start_time |end_time |id | >>> +-----------+-----------+---+ >>> |1456050620 |1456050621 |1 | >>> |1456050622 |14560506203|2 | >>> |14560506204|14560506205|3 | >>> +-----------+-----------+---+ >>> >>> root >>> |-- start_time: timestamp (nullable = true) >>> |-- end_time: timestamp (nullable = true) >>> |-- id: long (nullable = true) >>> >>> +---------------------+---------------------+---+ >>> |start_time |end_time |id | >>> +---------------------+---------------------+---+ >>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1 | >>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2 | >>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3 | >>> +---------------------+---------------------+---+ >>> >>> >>> In [21]: >>> >>> # working with millisecond times >>> >>> data = [ >>> >>> ("1456050620000", "1456050620000", 1)] >>> >>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"]) >>> >>> >>> >>> # convert epoch time strings in to spark timestamps >>> >>> df = df.select( >>> >>> df.start_time.cast("long").alias("start_time"), >>> >>> df.end_time.cast("long").alias("end_time"), >>> >>> df.id) >>> >>> df.printSchema() >>> >>> df.show(truncate=False) >>> >>> >>> >>> # convert longs to timestamps >>> >>> df = df.select( >>> >>> df.start_time.cast("timestamp").alias("start_time"), >>> >>> df.end_time.cast("timestamp").alias("end_time"), >>> >>> df.id) >>> >>> df.printSchema() >>> >>> df.show(truncate=False) >>> >>> root >>> |-- start_time: long (nullable = true) >>> |-- end_time: long (nullable = true) >>> |-- id: long (nullable = true) >>> >>> +-------------+-------------+---+ >>> |start_time |end_time |id | >>> +-------------+-------------+---+ >>> |1456050620000|1456050620000|1 | >>> +-------------+-------------+---+ >>> >>> root >>> |-- start_time: timestamp (nullable = true) >>> |-- end_time: timestamp (nullable = true) >>> |-- id: long (nullable = true) >>> >>> +----------------------+----------------------+---+ >>> |start_time |end_time |id | >>> +----------------------+----------------------+---+ >>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1 | >>> +----------------------+----------------------+---+ >>> >>> >>> >> >