[
https://issues.apache.org/jira/browse/BEAM-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401073#comment-17401073
]
Ryan Skraba commented on BEAM-12544:
------------------------------------
I left a comment on the PR: throwing away the microsecond resolution is
probably not an acceptable solution, even though our {{DATETIME}} only supports
milliseconds.
Here's one
[article|https://www.bloomberg.com/professional/blog/mifid-ii-clock-synchronization-advice-dont-drop-the-mic-part-1/],
for example, that describes the regulatory requirements for
microsecond-precision timestamps in financial trading. _(TL;DR:
Microsecond-precision timestamps are sometimes important.)_
There's a couple of ways to make this optional or to avoid losing information:
# *Make the choice explicit*: in the AvroUtils conversion via a configuration
knob {{cfg.setTruncateTimestampMicrosToMillis(true)}}. I don't think there's
currently a good place to put this, so we'd have to invent it.
# *Preprocess every single Avro record*: if the date semantics are more
important than the microsecond precision, then provide an optional transform
that turns each logical type from {{timestamp-micros}} to {{timestamp-millis}}
before conversion to a Beam Row.
# *Postprocess every single Beam Row*: provide an optional
{{AvroTruncateTransform<Row, Row>}} utility that uses an Avro schema to turn
all of the Beam {{LONG}} (but actually timestamps) into Beam {{DATETIME}} after
conversion to a Beam Row.
# *Annotate the Avro*: Only do the conversion/truncation if the Avro schema has
the annotation {{beam.row.truncateTimestampMicrosToMillis}} set to {{true}}.
This puts some processing information into the schema, but is probably the
least intrusive.
# *Convert to a DATETIME and LONG*: When you encounter a record with a
{{timestamp_micros}} field named {{ts}}, convert it into two output fields
{{ts_as_datetime}} and {{ts_as_long}}, and let a subsequent projection select
the one you actually want.
# *Convert to a structure including DATETIME and INT*: When you encounter a
record with a {{timestamp_micros}} field named {{ts}}, convert it into a nested
record with a {{DATETIME}} component and a {{INT}} just for the microsecond
part.
There's probably other solutions -- any ideas?
> Add support for Avro timestamps in microseconds
> -----------------------------------------------
>
> Key: BEAM-12544
> URL: https://issues.apache.org/jira/browse/BEAM-12544
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Affects Versions: Not applicable
> Reporter: Tobias Hermann
> Priority: P2
> Fix For: Not applicable
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> GenericRecordToRowFn in AvroUtils does not support the logical Avro type
> "timestamp-micros". Instead of converting it to FieldType.DATETIME (as it
> does with "timestamp-millis") it just interprets it as a raw LONG.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)