What little I know about working with timestamps is based on
https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-da
tetimestring-handling-time-intervals-and-udafs.html

Using the example of dates formatted into human friend strings -> timeStamps
I was able to figure out how to convert Epoch times to timestamps. The same
trick did not work for millisecond times.

Any suggestions would be greatly appreciated.


Andy

Working with epoch times
<http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-wi
th-epock-times> 
ref: http://www.epochconverter.com/
Epoch timestamp:  1456050620
Timestamp in milliseconds: 1456050620000
Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
Human time (your time zone): 2/21/2016, 2:30:20 AM

# Epoch time stamp example
data = [
  ("1456050620", "1456050621", 1),
  ("1456050622", "14560506203", 2),
  ("14560506204", "14560506205", 3)]
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
​
# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
​
# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
​
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-----------+-----------+---+
|start_time |end_time   |id |
+-----------+-----------+---+
|1456050620 |1456050621 |1  |
|1456050622 |14560506203|2  |
|14560506204|14560506205|3  |
+-----------+-----------+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+---------------------+---------------------+---+
|start_time           |end_time             |id |
+---------------------+---------------------+---+
|2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
|2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
|2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
+---------------------+---------------------+---+

In [21]:
# working with millisecond times
data = [
  ("1456050620000", "1456050620000", 1)]
  
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
​
# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
​
# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-------------+-------------+---+
|start_time   |end_time     |id |
+-------------+-------------+---+
|1456050620000|1456050620000|1  |
+-------------+-------------+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+----------------------+----------------------+---+
|start_time            |end_time              |id |
+----------------------+----------------------+---+
|48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
+----------------------+----------------------+---+



Reply via email to