GitHub user rniemo-g opened a pull request:
https://github.com/apache/beam/pull/3933
[BEAM-1936] Add ability to provide function to extract timestamp from
payload in â¦
What: This pull request adds the ability to extract timestamps from Pubsub
message bodies.
Why: Currently an attribute name containing timestamps can be provided, but
in the case where message publisher isn't controlled by the user, correct
windowing requires a way to pull timestamps from the message payload.
How: Instead of passing a timestamp attribute (String) through PubsubIO ->
PubsubClient, this has been generalized to a PubsubTimestampExtractor which is
instantiated 3 ways:
1) empty constructor. This means the default publish time is used as the
message timestamp
2) with a String timestamp attribute name. This works the same as the
current approach.
3) with a Function<String, String> extractor. This function extractor takes
in the message payload as a string, and returns the parsed timestamp. In the
case the returned timestamp isn't a timestamp, an exception is thrown saying
the timestamp cannot be interpreted (same behavior as when a timestamp
attribute can't be interpreted as a timestamp)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rniemo-g/beam pubsub-timestamp-extraction
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3933.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3933
----
commit 611b81b360ecce8e2955bbd4e77d02bf84e8ec30
Author: Ryan Niemocienski <[email protected]>
Date: 2017-10-02T19:09:01Z
Add ability to provide function to extract timestamp from payload in
PubsubIO
----
---