Thank you Udi. Left some high level comments on the PR.
On Mon, Mar 19, 2018 at 5:13 PM, Udi Meiri <[email protected]> wrote: > Hi, > I wanted to get feedback about the upcoming Python Pubsub API. It is > currently experimental and only supports reading and writing UTF-8 strings. > My current proposal only concerns reading from Pubsub. > > Classes: > - PubsubMessage: encapsulates Pubsub message payload and attributes. > > PTransforms: > - ReadMessagesFromPubSub: Outputs elements of type ``PubsubMessage``. > > - ReadPayloadsFromPubSub: Outputs elements of type ``str``. > > - ReadStringsFromPubSub: Outputs elements of type ``unicode``, decoded > from UTF-8. > > Description of common PTransform arguments: > topic: Cloud Pub/Sub topic in the form "projects/<project>/topics/< > topic>". > If provided, subscription must be None. > subscription: Existing Cloud Pub/Sub subscription to use in the > form "projects/<project>/subscriptions/<subscription>". If not > specified, > a temporary subscription will be created from the specified topic. If > provided, topic must be None. > id_label: The attribute on incoming Pub/Sub messages to use as a unique > record identifier. When specified, the value of this attribute (which > can be any string that uniquely identifies the record) will be used for > deduplication of messages. If not provided, we cannot guarantee > that no duplicate data will be delivered on the Pub/Sub stream. In this > case, deduplication of the stream will be strictly best effort. > timestamp_attribute: Message value to use as element timestamp. If None, > uses message publishing time as the timestamp. > Timestamp values should be in one of two formats: > - A numerical value representing the number of milliseconds since the > Unix > epoch. > - A string in RFC 3339 format. For example, > {@code 2015-10-29T23:41:41.123Z}. The sub-second component of the > timestamp is optional, and digits beyond the first three (i.e., time > units > smaller than milliseconds) will be ignored. > > Code: https://github.com/udim/beam/blob/b981dd618e9e1f667597eec2a91c72 > 65a389c405/sdks/python/apache_beam/io/gcp/pubsub.py > PR: https://github.com/apache/beam/pull/4901 > >
