Hi, fellows, long time no see on the mailing ~ Here I want to have a discussion on the join syntax of our recently introduced window table function ~
For example, we can define a tumbling window function of 5 minutes size as: Tumble(table T, descriptor(T.ts), INTERVAL ‘5’ MINUTE) The we can select from it, and moreover, I want to support 2 window function join for the streaming query recently. The semantics of the windowed stream join is: • The 2 window inputs should have the same window arguments (except for the table name), e.g. for TUMBLE the size should be equal, for HOP, both the side interval and size should be equal • We first window the input stream then join the both window data set of the same TimeWindow • The Join action is triggered by the watermark of the stream • The join does not produce retractions of the stream, the mainly difference with normal two-stream join And I want to propose a join syntax as: Select L.f0, R.f2, L.window_start, L.window_end FROM Tumble(table T1, descriptor(T1.ts), INTERVAL ‘5’ MINUTE) L JOIN Tumble(table T2, descriptor(T2.ts), INTERVAL ‘5’ MINUTE) R ON L.f0 = R.f0 AND L.window_start = R.window_start AND L.window_end = R.window_end The red syntax part is what I want to discuss, the condition seems too verbose because user need to declare it every time. • Should we make it optional ? • Is there better syntax to describe this window join semantics ? Best, Danny Chan
