@bastien, in those situations, I prefer to use Unix timestamps (millisecond or second granularity) because you can apply math operations to them easily. If you don't have a Unix timestamp, you can use unix_timestamp() from Hive SQL to get one with second granularity.Then doing grouping by hour becomes very simple: select 3600*floor(timestamp/3600) as timestamp, count(error) as errors,from logsgroup by 3600*floor(timestamp/3600) Hope this helps./Sim
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-groupby-timestamp-tp23470p23615.html Sent from the Apache Spark User List mailing list archive at Nabble.com.