This sounds awesome!

Is event timestamp something that we need to specify for every source? If
so, I would suggest we add this as a first class option on CREATE TABLE
rather then something hidden in TBLPROPERTIES.

Andrew

On Wed, May 2, 2018 at 10:30 AM Anton Kedin <ke...@google.com> wrote:

> Hi
>
> I am working on adding functionality to support querying Pubsub messages
> directly from Beam SQL.
>
> *Goal*
>   Provide Beam users a pure  SQL solution to create the pipelines with
> Pubsub as a data source, without the need to set up the pipelines in Java
> before applying the query.
>
> *High level approach*
>
>    -
>    - Build on top of PubsubIO;
>    - Pubsub source will be declared using CREATE TABLE DDL statement:
>       - Beam SQL already supports declaring sources like Kafka and Text
>       using CREATE TABLE DDL;
>       - it supports additional configuration using TBLPROPERTIES clause.
>       Currently it takes a text blob, where we can put a JSON configuration;
>       - wrapping PubsubIO into a similar source looks feasible;
>    - The plan is to initially support messages only with JSON payload:
>    -
>       - more payload formats can be added later;
>    - Messages will be fully described in the CREATE TABLE statements:
>       - event timestamps. Source of the timestamp is configurable. It is
>       required by Beam SQL to have an explicit timestamp column for windowing
>       support;
>       - messages attributes map;
>       - JSON payload schema;
>    - Event timestamps will be taken either from publish time or
>    user-specified message attribute (configurable);
>
> Thoughts, ideas, comments?
>
> More details are in the doc here:
> https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcAxYfE
>
>
> Thank you,
> Anton
>

Reply via email to