What little I know about working with timestamps is based on https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-da tetimestring-handling-time-intervals-and-udafs.html
Using the example of dates formatted into human friend strings -> timeStamps I was able to figure out how to convert Epoch times to timestamps. The same trick did not work for millisecond times. Any suggestions would be greatly appreciated. Andy Working with epoch times <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-wi th-epock-times> ref: http://www.epochconverter.com/ Epoch timestamp: 1456050620 Timestamp in milliseconds: 1456050620000 Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT Human time (your time zone): 2/21/2016, 2:30:20 AM # Epoch time stamp example data = [ ("1456050620", "1456050621", 1), ("1456050622", "14560506203", 2), ("14560506204", "14560506205", 3)] df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"]) # convert epoch time strings in to spark timestamps df = df.select( df.start_time.cast("long").alias("start_time"), df.end_time.cast("long").alias("end_time"), df.id) df.printSchema() df.show(truncate=False) # convert longs to timestamps df = df.select( df.start_time.cast("timestamp").alias("start_time"), df.end_time.cast("timestamp").alias("end_time"), df.id) df.printSchema() df.show(truncate=False) root |-- start_time: long (nullable = true) |-- end_time: long (nullable = true) |-- id: long (nullable = true) +-----------+-----------+---+ |start_time |end_time |id | +-----------+-----------+---+ |1456050620 |1456050621 |1 | |1456050622 |14560506203|2 | |14560506204|14560506205|3 | +-----------+-----------+---+ root |-- start_time: timestamp (nullable = true) |-- end_time: timestamp (nullable = true) |-- id: long (nullable = true) +---------------------+---------------------+---+ |start_time |end_time |id | +---------------------+---------------------+---+ |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1 | |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2 | |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3 | +---------------------+---------------------+---+ In [21]: # working with millisecond times data = [ ("1456050620000", "1456050620000", 1)] df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"]) # convert epoch time strings in to spark timestamps df = df.select( df.start_time.cast("long").alias("start_time"), df.end_time.cast("long").alias("end_time"), df.id) df.printSchema() df.show(truncate=False) # convert longs to timestamps df = df.select( df.start_time.cast("timestamp").alias("start_time"), df.end_time.cast("timestamp").alias("end_time"), df.id) df.printSchema() df.show(truncate=False) root |-- start_time: long (nullable = true) |-- end_time: long (nullable = true) |-- id: long (nullable = true) +-------------+-------------+---+ |start_time |end_time |id | +-------------+-------------+---+ |1456050620000|1456050620000|1 | +-------------+-------------+---+ root |-- start_time: timestamp (nullable = true) |-- end_time: timestamp (nullable = true) |-- id: long (nullable = true) +----------------------+----------------------+---+ |start_time |end_time |id | +----------------------+----------------------+---+ |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1 | +----------------------+----------------------+---+