Timestamps in avro are represented by a long. Your best bet is to convert the string to the long representation. For example to ms since epoch, then write to avro. There is a logical type for this. Hive will automatically read this as a timestamp. If you already have the data in a hive table as a string, you can do a query to parse the string timestamp to an actual timestamp, then write to parquet. Long story short I would change whatever is writing the avro file to use the proper timestamp encoding.
https://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29 On Fri, Oct 16, 2020 at 1:26 PM Almeida, Julius <[email protected]> wrote: > > > Hi Team, > > I wish to store timestamp(eg: 2020-10-13 13:01:0) in parquet using avro > schema to load to hive. > but I don’t see any option to do so, I pass it as a string, but hive fails to > read it as it is expecting timestamp > I am using ParquetIO from apache beam to write output files in parquet format. > > avro schema: > { > "type": "record", > "name": "TestRecord", > "namespace": "com.test-v2.avro", > "fields": [{ > "name": "reason", > "type": "string" > }, { > "name": "event", > "type": "string" > }, { > "name": "timestamp", > "type": "string" > }] > } > output Event: > { > "reason": "Invalid", > "event": "dropped", > "timestamp": "2020-10-13 13:01:05" > } > Hive schema is > reason -> varchar(100), > event -> varchar(100), > timestamp -> timestamp > > > but when trying to query hive it fails to read timestamp column since input > event has string type. > > I see this Jira in place https://issues.apache.org/jira/browse/AVRO-2924 > > And I believe the code change can be done here : > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java > > Correct me if I am wrong. > > Thanks, > Julius
