This sounds awesome! Is event timestamp something that we need to specify for every source? If so, I would suggest we add this as a first class option on CREATE TABLE rather then something hidden in TBLPROPERTIES.
Andrew On Wed, May 2, 2018 at 10:30 AM Anton Kedin <ke...@google.com> wrote: > Hi > > I am working on adding functionality to support querying Pubsub messages > directly from Beam SQL. > > *Goal* > Provide Beam users a pure SQL solution to create the pipelines with > Pubsub as a data source, without the need to set up the pipelines in Java > before applying the query. > > *High level approach* > > - > - Build on top of PubsubIO; > - Pubsub source will be declared using CREATE TABLE DDL statement: > - Beam SQL already supports declaring sources like Kafka and Text > using CREATE TABLE DDL; > - it supports additional configuration using TBLPROPERTIES clause. > Currently it takes a text blob, where we can put a JSON configuration; > - wrapping PubsubIO into a similar source looks feasible; > - The plan is to initially support messages only with JSON payload: > - > - more payload formats can be added later; > - Messages will be fully described in the CREATE TABLE statements: > - event timestamps. Source of the timestamp is configurable. It is > required by Beam SQL to have an explicit timestamp column for windowing > support; > - messages attributes map; > - JSON payload schema; > - Event timestamps will be taken either from publish time or > user-specified message attribute (configurable); > > Thoughts, ideas, comments? > > More details are in the doc here: > https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcAxYfE > > > Thank you, > Anton >