[ 
https://issues.apache.org/jira/browse/BEAM-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401073#comment-17401073
 ] 

Ryan Skraba commented on BEAM-12544:
------------------------------------

I left a comment on the PR: throwing away the microsecond resolution is 
probably not an acceptable solution, even though our {{DATETIME}} only supports 
milliseconds.

Here's one 
[article|https://www.bloomberg.com/professional/blog/mifid-ii-clock-synchronization-advice-dont-drop-the-mic-part-1/],
 for example, that describes the regulatory requirements for 
microsecond-precision timestamps in financial trading. _(TL;DR: 
Microsecond-precision timestamps are sometimes important.)_

There's a couple of ways to make this optional or to avoid losing information:

# *Make the choice explicit*: in the AvroUtils conversion via a configuration 
knob {{cfg.setTruncateTimestampMicrosToMillis(true)}}.  I don't think there's 
currently a good place to put this, so we'd have to invent it.
# *Preprocess every single Avro record*: if the date semantics are more 
important than the microsecond precision, then provide an optional transform 
that turns each logical type from {{timestamp-micros}} to {{timestamp-millis}} 
before conversion to a Beam Row.
# *Postprocess every single Beam Row*: provide an optional 
{{AvroTruncateTransform<Row, Row>}} utility that uses an Avro schema to turn 
all of the Beam {{LONG}} (but actually timestamps) into Beam {{DATETIME}} after 
conversion to a Beam Row.
# *Annotate the Avro*: Only do the conversion/truncation if the Avro schema has 
the annotation {{beam.row.truncateTimestampMicrosToMillis}} set to {{true}}.  
This puts some processing information into the schema, but is probably the 
least intrusive.
# *Convert to a DATETIME and LONG*: When you encounter a record with a 
{{timestamp_micros}} field named {{ts}}, convert it into two output fields 
{{ts_as_datetime}} and {{ts_as_long}}, and let a subsequent projection select 
the one you actually want.
# *Convert to a structure including DATETIME and INT*: When you encounter a 
record with a {{timestamp_micros}} field named {{ts}}, convert it into a nested 
record with a {{DATETIME}} component and a {{INT}} just for the microsecond 
part.

There's probably other solutions -- any ideas?

> Add support for Avro timestamps in microseconds
> -----------------------------------------------
>
>                 Key: BEAM-12544
>                 URL: https://issues.apache.org/jira/browse/BEAM-12544
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: Not applicable
>            Reporter: Tobias Hermann
>            Priority: P2
>             Fix For: Not applicable
>
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> GenericRecordToRowFn in AvroUtils does not support the logical Avro type 
> "timestamp-micros". Instead of converting it to FieldType.DATETIME (as it 
> does with "timestamp-millis") it just interprets it as a raw LONG.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to