I think this thread is the only discussion of it. I still favor a separate transform SqlTransform.withMetadata().query(...). That way, there's no change to SQL.
Kenn On Wed, Aug 21, 2019 at 12:53 AM Reza Rokni <r...@google.com> wrote: > @Kenn / @Rob has there been any other discussions on how the timestamp > value can be accessed from within the SQL since this thread in May? > > If not my vote is for a convenience method that gives access to the > timestamp as a function call within the SQL statement. > > Reza > > On Wed, 22 May 2019 at 10:06, Reza Rokni <r...@google.com> wrote: > >> Hi, >> >> Coming back to this do we have enough of a consensus to say that in >> principle this is a good idea? If yes I will raise a Jira for this. >> >> Cheers >> >> Reza >> >> On Thu, 16 May 2019 at 02:58, Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Wed, May 15, 2019 at 8:51 PM Kenneth Knowles <k...@apache.org> wrote: >>> > >>> > On Wed, May 15, 2019 at 3:05 AM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >> >>> >> Isn't there an API for concisely computing new fields from old ones? >>> >> Perhaps these expressions could contain references to metadata value >>> >> such as timestamp. Otherwise, >>> > >>> > Even so, being able to refer to the timestamp implies something about >>> its presence in a namespace, shared with other user-decided names. >>> >>> I was thinking that functions may live in a different namespace than >>> fields. >>> >>> > And it may be nice for users to use that API within the composite >>> SqlTransform. I think there are a lot of options. >>> > >>> >> Rather than withMetadata reifying the value as a nested field, with >>> >> the timestamp, window, etc. at the top level, one could let it take a >>> >> field name argument that attaches all the metadata as an extra >>> >> (struct-like) field. This would be like attachX, but without having to >>> >> have a separate method for every X. >>> > >>> > If you leave the input field names at the top level, then any "attach" >>> style API requires choosing a name that doesn't conflict with input field >>> names. You can't write a generic transform that works with all inputs. I >>> think it is much simpler to move the input field all into a nested >>> row/struct. Putting all the metadata in a second nested row/struct is just >>> as good as top-level, perhaps. But moving the input into the struct/row is >>> important. >>> >>> Very good point about writing generic transforms. It does mean a lot >>> of editing if one decides one wants to access the metadata field(s) >>> after-the-fact. (I also don't think we need to put the metadata in a >>> nested struct if the value is.) >>> >>> >> It seems restrictive to only consider this a a special mode for >>> >> SqlTransform rather than a more generic operation. (For SQL, my first >>> >> instinct would be to just make this a special function like >>> >> element_timestamp(), but there is some ambiguity there when there are >>> >> multiple tables in the expression.) >>> > >>> > I would propose it as both: we already have some Reify transforms, and >>> you could make a general operation that does this small data preparation >>> easily. I think the proposal is just to add a convenience build method on >>> SqlTransform to include the underlying functionality as part of the >>> composite, which we really already have. >>> > >>> > I don't think we should extend SQL with built-in functions for >>> element_timestamp() and things like that, because SQL already has TIMESTAMP >>> columns and it is very natural to use SQL on unbounded relations where the >>> timestamp is just part of the data. >>> >>> That's why I was suggesting a single element_metadata() rather than >>> exploding each one out. >>> >>> Do you have a pointer to what the TIMESTAMP columns are? (I'm assuming >>> this is a special field, but distinct from the metadata timestamp?) >>> >>> >> On Wed, May 15, 2019 at 5:03 AM Reza Rokni <r...@google.com> wrote: >>> >> > >>> >> > Hi, >>> >> > >>> >> > One use case would be when dealing with the windowing functions for >>> example: >>> >> > >>> >> > SELECT f_int, COUNT(*) , TUMBLE_START(f_timestamp, INTERVAL '1' >>> HOUR) tumble_start >>> >> > FROM PCOLLECTION >>> >> > GROUP BY >>> >> > f_int, >>> >> > TUMBLE(f_timestamp, INTERVAL '1' HOUR) >>> >> > >>> >> > For an element which is using Metadata to inform the EvenTime of >>> the element, rather than data within the element itself, I would need to >>> create a new schema which added the timestamp as a field. I think other >>> examples which maybe interesting is getting the value of a row with the >>> max/min timestamp. None of this would be difficult but it does feel a >>> little on the verbose side and also makes the pipeline a little harder to >>> read. >>> >> > >>> >> > Cheers >>> >> > Reza >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > From: Kenneth Knowles <k...@apache.org> >>> >> > Date: Wed, 15 May 2019 at 01:15 >>> >> > To: dev >>> >> > >>> >> >> We have support for nested rows so this should be easy. The >>> .withMetadata would reify the struct, moving from Row to WindowedValue<Row> >>> if I understand it... >>> >> >> >>> >> >> SqlTransform.query("SELECT field1 from PCOLLECTION"): >>> >> >> >>> >> >> Schema = { >>> >> >> field1: type1, >>> >> >> field2: type2 >>> >> >> } >>> >> >> >>> >> >> SqlTransform.query(...) >>> >> >> >>> >> >> SqlTransform.withMetadata().query("SELECT event_timestamp, >>> value.field1 FROM PCOLLECTION") >>> >> >> >>> >> >> Derived schema = { >>> >> >> event_timestamp: TIMESTAMP, >>> >> >> pane_info: { ... } >>> >> >> value: { >>> >> >> field1: type1, >>> >> >> field2: type2, >>> >> >> ... >>> >> >> } >>> >> >> } >>> >> >> >>> >> >> SqlTransform would expand into a different composite, and it would >>> be a straightforward ParDo to adjust the data, possibly automatic via the >>> new schema conversions. >>> >> >> >>> >> >> Embedding the window would be a bit wonky, something like { >>> end_of_window: TIMESTAMP, encoded_window: bytes } which would be expensive >>> due to encoding. But timestamp and pane info not so bad. >>> >> >> >>> >> >> Kenn >>> >> >> >>> >> >> From: Anton Kedin <ke...@google.com> >>> >> >> Date: Tue, May 14, 2019 at 9:17 AM >>> >> >> To: <dev@beam.apache.org> >>> >> >> >>> >> >>> Reza, can you share more thoughts on how you think this can work >>> end-to-end? >>> >> >>> >>> >> >>> Currently the approach is that populating the rows with the data >>> happens before the SqlTransform, and within the query you can only use the >>> things that are already in the rows or in the catalog/schema (or built-in >>> things). In general case populating the rows with any data can be solved >>> via a ParDo before SqlTransform. Do you think this approach lacks something >>> or maybe too verbose? >>> >> >>> >>> >> >>> My thoughts on this, lacking more info or concrete examples: in >>> order to access a timestamp value from within a query there has to be a >>> syntax for it. Field access expressions or function calls are the only >>> things that come to mind among existing syntax features that would allow >>> that. Making timestamp a field of the data row makes more sense to me here >>> because in Beam it is already a part of the element. It's not a result of a >>> function call and it's already easily accessible, doesn't make sense to >>> build extra functions here. One of the problems with both approaches >>> however is the potential conflicts with the existing schema of the data >>> elements (or the schema/catalog of the data source in general). E.g. if we >>> add a magical "event_timestamp" column or "event_timestamp()" function >>> there may potentially already exist a field or a function in the schema >>> with this name. This can be solved in couple of ways, but we will probably >>> want to provide a configuration mechanism to assign a different >>> field/function names in case of conflicts. >>> >> >>> >>> >> >>> Given that, it may make sense to allow users to attach the whole >>> pane info or some subset of it to the row (e.g. only the timestamp), and >>> make that configurable. However I am not sure whether exposing something >>> like pane info is enough and will cover a lot of useful cases. Plus adding >>> methods like `attachTimestamp("fieldname")` or >>> `attachWindowInfo("fieldname")` might open a portal to ever-increasing >>> collection of these `attachX()`, `attachY()` that can make the API less >>> usable. If on the other hand we would make it more generic then it will >>> probably have to look a lot like a ParDo or MapElements.via() anyway. And >>> at that point the question would be whether it makes sense to build >>> something extra that probably looks and functions like an existing feature. >>> >> >>> >>> >> >>> Regards, >>> >> >>> Anton >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> From: Andrew Pilloud <apill...@google.com> >>> >> >>> Date: Tue, May 14, 2019 at 7:29 AM >>> >> >>> To: dev >>> >> >>> >>> >> >>>> Hi Reza, >>> >> >>>> >>> >> >>>> Where will this metadata be coming from? Beam SQL is tightly >>> coupled with the schema of the PCollection, so adding fields not in the >>> data would be difficult. >>> >> >>>> >>> >> >>>> If what you want is the timestamp out of the DoFn.ProcessContext >>> we might be able to add a SQL function to fetch that. >>> >> >>>> >>> >> >>>> Andrew >>> >> >>>> >>> >> >>>> From: Reza Rokni <r...@google.com> >>> >> >>>> Date: Tue, May 14, 2019, 1:08 AM >>> >> >>>> To: <dev@beam.apache.org> >>> >> >>>> >>> >> >>>>> Hi, >>> >> >>>>> >>> >> >>>>> What are folks thoughts about adding something like >>> SqlTransform.withMetadata().query(...)to enable users to be able to access >>> things like Timestamp information from within the query without having to >>> refiy the information into the element itself? >>> >> >>>>> >>> >> >>>>> Cheers >>> >> >>>>> Reza >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> -- >>> >> >>>>> >>> >> >>>>> This email may be confidential and privileged. If you received >>> this communication by mistake, please don't forward it to anyone else, >>> please erase all copies and attachments, and please let me know that it has >>> gone to the wrong person. >>> >> >>>>> >>> >> >>>>> The above terms reflect a potential business arrangement, are >>> provided solely as a basis for further discussion, and are not intended to >>> be and do not constitute a legally binding obligation. No legally binding >>> obligations will be created, implied, or inferred until an agreement in >>> final form is executed in writing by all parties involved. >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > >>> >> > This email may be confidential and privileged. If you received this >>> communication by mistake, please don't forward it to anyone else, please >>> erase all copies and attachments, and please let me know that it has gone >>> to the wrong person. >>> >> > >>> >> > The above terms reflect a potential business arrangement, are >>> provided solely as a basis for further discussion, and are not intended to >>> be and do not constitute a legally binding obligation. No legally binding >>> obligations will be created, implied, or inferred until an agreement in >>> final form is executed in writing by all parties involved. >>> >> >> >> -- >> >> This email may be confidential and privileged. If you received this >> communication by mistake, please don't forward it to anyone else, please >> erase all copies and attachments, and please let me know that it has gone >> to the wrong person. >> >> The above terms reflect a potential business arrangement, are provided >> solely as a basis for further discussion, and are not intended to be and do >> not constitute a legally binding obligation. No legally binding obligations >> will be created, implied, or inferred until an agreement in final form is >> executed in writing by all parties involved. >> > > > -- > > This email may be confidential and privileged. If you received this > communication by mistake, please don't forward it to anyone else, please > erase all copies and attachments, and please let me know that it has gone > to the wrong person. > > The above terms reflect a potential business arrangement, are provided > solely as a basis for further discussion, and are not intended to be and do > not constitute a legally binding obligation. No legally binding obligations > will be created, implied, or inferred until an agreement in final form is > executed in writing by all parties involved. >