[ 
https://issues.apache.org/jira/browse/BEAM-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188857#comment-16188857
 ] 

ASF GitHub Bot commented on BEAM-1936:
--------------------------------------

GitHub user rniemo-g opened a pull request:

    https://github.com/apache/beam/pull/3933

    [BEAM-1936] Add ability to provide function to extract timestamp from 
payload in …

    
    What: This pull request adds the ability to extract timestamps from Pubsub 
message bodies. 
    Why: Currently an attribute name containing timestamps can be provided, but 
in the case where message publisher isn't controlled by the user, correct 
windowing requires a way to pull timestamps from the message payload.
    How: Instead of passing a timestamp attribute (String) through PubsubIO -> 
PubsubClient, this has been generalized to a PubsubTimestampExtractor which is 
instantiated 3 ways: 
    1) empty constructor. This means the default publish time is used as the 
message timestamp
    2) with a String timestamp attribute name. This works the same as the 
current approach.
    3) with a Function<String, String> extractor. This function extractor takes 
in the message payload as a string, and returns the parsed timestamp. In the 
case the returned timestamp isn't a timestamp, an exception is thrown saying 
the timestamp cannot be interpreted (same behavior as when a timestamp 
attribute can't be interpreted as a timestamp)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rniemo-g/beam pubsub-timestamp-extraction

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3933
    
----
commit 611b81b360ecce8e2955bbd4e77d02bf84e8ec30
Author: Ryan Niemocienski <[email protected]>
Date:   2017-10-02T19:09:01Z

    Add ability to provide function to extract timestamp from payload in 
PubsubIO

----


> Allow user provided function to extract custom timestamp from payload in 
> pubsubIO
> ---------------------------------------------------------------------------------
>
>                 Key: BEAM-1936
>                 URL: https://issues.apache.org/jira/browse/BEAM-1936
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-gcp
>            Reporter: Keith Berkoben
>
> Currently the PubsubIO runner only allows the caller to set a custom 
> timestamp if the timestamp is defined in the attributes of the message.  This 
> can be problematic when the user does not control the publisher.  In such a 
> case, proper windowing of data requires the timestamp to be pulled out of the 
> message payload.  
> Since a payload could have arbitrary data, the user would have to provide a 
> Function<T, String>() that would extract the timestamp from the payload:
> PubsubIo.Read.timestampLabel(Function<T, String> extractor);



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to