Re: Store timestamp with Logical Types

Jeremy Custenborder Fri, 16 Oct 2020 12:33:20 -0700

Timestamps in avro are represented by a long. Your best bet is to
convert the string to the long representation. For example to ms since
epoch, then write to avro. There is a logical type for this. Hive will
automatically read this as a timestamp. If you already have the data
in a hive table as a string, you can do a query to parse the string
timestamp to an actual timestamp, then write to parquet. Long story
short I would change whatever is writing the avro file to use the
proper timestamp encoding.


https://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29

On Fri, Oct 16, 2020 at 1:26 PM Almeida, Julius
<[email protected]> wrote:
>
>
> Hi Team,
>
> I wish to store timestamp(eg: 2020-10-13 13:01:0) in parquet using avro 
> schema to load to hive.
> but I don’t see any option to do so, I pass it as a string, but hive fails to 
> read it as it is expecting timestamp
> I am using ParquetIO from apache beam to write output files in parquet format.
>
> avro schema:
> {
>                 "type": "record",
>                 "name": "TestRecord",
>                 "namespace": "com.test-v2.avro",
>                 "fields": [{
>                                 "name": "reason",
>                                 "type": "string"
>                 }, {
>                                 "name": "event",
>                                 "type": "string"
>                 }, {
>                                 "name": "timestamp",
>                                 "type": "string"
>                 }]
> }
> output Event:
> {
>                 "reason": "Invalid",
>                 "event": "dropped",
>                 "timestamp": "2020-10-13 13:01:05"
> }
> Hive schema is
> reason ->  varchar(100),
> event -> varchar(100),
> timestamp -> timestamp
>
>
> but when trying to query hive it fails to read timestamp column since input 
> event has string type.
>
> I see this Jira in place https://issues.apache.org/jira/browse/AVRO-2924
>
> And I believe the code change can be done here :  
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java
>
> Correct me if I am wrong.
>
> Thanks,
> Julius

Re: Store timestamp with Logical Types

Reply via email to