Thank you Udi. Left some high level comments on the PR.

On Mon, Mar 19, 2018 at 5:13 PM, Udi Meiri <[email protected]> wrote:

> Hi,
> I wanted to get feedback about the upcoming Python Pubsub API. It is
> currently experimental and only supports reading and writing UTF-8 strings.
> My current proposal only concerns reading from Pubsub.
>
> Classes:
> - PubsubMessage: encapsulates Pubsub message payload and attributes.
>
> PTransforms:
> - ReadMessagesFromPubSub: Outputs elements of type ``PubsubMessage``.
>
> - ReadPayloadsFromPubSub: Outputs elements of type ``str``.
>
> - ReadStringsFromPubSub: Outputs elements of type ``unicode``, decoded
> from UTF-8.
>
> Description of common PTransform arguments:
>   topic: Cloud Pub/Sub topic in the form "projects/<project>/topics/<
> topic>".
>     If provided, subscription must be None.
>   subscription: Existing Cloud Pub/Sub subscription to use in the
>     form "projects/<project>/subscriptions/<subscription>". If not
> specified,
>     a temporary subscription will be created from the specified topic. If
>     provided, topic must be None.
>   id_label: The attribute on incoming Pub/Sub messages to use as a unique
>     record identifier. When specified, the value of this attribute (which
>     can be any string that uniquely identifies the record) will be used for
>     deduplication of messages. If not provided, we cannot guarantee
>     that no duplicate data will be delivered on the Pub/Sub stream. In this
>     case, deduplication of the stream will be strictly best effort.
>   timestamp_attribute: Message value to use as element timestamp. If None,
>     uses message publishing time as the timestamp.
>     Timestamp values should be in one of two formats:
>     - A numerical value representing the number of milliseconds since the
> Unix
>       epoch.
>     - A string in RFC 3339 format. For example,
>       {@code 2015-10-29T23:41:41.123Z}. The sub-second component of the
>       timestamp is optional, and digits beyond the first three (i.e., time
> units
>       smaller than milliseconds) will be ignored.
>
> Code: https://github.com/udim/beam/blob/b981dd618e9e1f667597eec2a91c72
> 65a389c405/sdks/python/apache_beam/io/gcp/pubsub.py
> PR: https://github.com/apache/beam/pull/4901
>
>

Reply via email to