Hi I am working on adding functionality to support querying Pubsub messages directly from Beam SQL.
*Goal* Provide Beam users a pure SQL solution to create the pipelines with Pubsub as a data source, without the need to set up the pipelines in Java before applying the query. *High level approach* - - Build on top of PubsubIO; - Pubsub source will be declared using CREATE TABLE DDL statement: - Beam SQL already supports declaring sources like Kafka and Text using CREATE TABLE DDL; - it supports additional configuration using TBLPROPERTIES clause. Currently it takes a text blob, where we can put a JSON configuration; - wrapping PubsubIO into a similar source looks feasible; - The plan is to initially support messages only with JSON payload: - - more payload formats can be added later; - Messages will be fully described in the CREATE TABLE statements: - event timestamps. Source of the timestamp is configurable. It is required by Beam SQL to have an explicit timestamp column for windowing support; - messages attributes map; - JSON payload schema; - Event timestamps will be taken either from publish time or user-specified message attribute (configurable); Thoughts, ideas, comments? More details are in the doc here: https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcAxYfE Thank you, Anton