Oops, I saw that you have addressed that comment in https://issues.apache.org/jira/browse/CALCITE-1912 <https://issues.apache.org/jira/browse/CALCITE-1912>. Let’s discuss there.
> On Aug 2, 2017, at 11:34 AM, Julian Hyde <[email protected]> wrote: > > I do think that the best way to write the query is > > SELECT * > FROM Orders AS o > JOIN ProductPrices FOR SYSTEM_TIME AS OF o.OrderType AS p > ON o.productId = p.productId > > But do we agree that this goes beyond the standard? It breaches "c) QSTPS > shall not contain a <column reference> or an <SQL parameter reference>.” > because “o.OrderType” is a column reference. > > Julian > >> On Aug 1, 2017, at 1:33 AM, 伍翀(云邪) <[email protected]> wrote: >> >> Hi Julian, >> >> We are interested in both and I have logged CALCITE-1912 for supporting “FOR >> SYSTEM_TIME AS OF” in regular queries. >> >> Regarding to (b), you mean it can be covered by standard. I agree, the query >> I posted in the previous mail can be rewrote as a subquery: >> >> SELECT * >> FROM Orders AS o >> JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime >> AND sysEnd > O.orderTime) AS P >> ON o.productId = p.productId >> >> But subquery is too complex for users, the period condition is trivial for >> users. Using the “FOR SYSTEM_TIME AS OF” is much simpler. Yes, you are >> right that the standard says it can’t contain a column reference. We >> initialize this discuss as we would like to "extend" the standard to >> simplify such query. What do you think? >> >> - Jark Wu >> >>> 在 2017年8月1日,上午2:58,Julian Hyde <[email protected]> 写道: >>> >>> Can you clarify whether you are interested in (a) streaming queries, (b) >>> temporal non-streaming queries, or both? If (b), it is covered by the >>> standard, and we should follow the standard, period. If (a), syntax and >>> semantics are still under discussion. >>> >>> If you want both, could we perhaps separate the streaming and non-streaming >>> cases? Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in >>> regular (non-streaming) queries? >>> >>>> The SQL standard doesn’t explicitly say the <point in time> must be a >>>> constant, just all the databases only support constant. >>> >>> I disagree. The standard says that it cannot contain a column reference: >>> >>>> If <query system time period specification> QSTPS is specified, then: >>>> a) The table identified by the <table or query name> shall be a >>>> system-versioned table. >>>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is >>>> specified, then ASYMMETRIC is implicit. >>>> c) QSTPS shall not contain a <column reference> or an <SQL parameter >>>> reference>. >>> >>> We can go beyond the standard, but let’s be clear what it says. >>> >>> Julian >>> >>> >>> >>> >>>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <[email protected]> wrote: >>>> >>>> Hi Julian, >>>> >>>> The syntax you posted looks very good to me. Besides of that, we hope to >>>> extend the <point in time> to support column reference(i.e. LATERAL + >>>> temporal syntax). >>>> >>>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It >>>> is still a query constant, but what we want is a variable timestamp >>>> changing with every record. The example in “Streams, joins and temporal >>>> tables” [1] fits our requirement much more. It uses the table-value >>>> function to express and not introduce new syntax. But it it is not a >>>> standard to manipulate temporal table. And it can only express querying >>>> the table content as of a specified point in time, but can not express a >>>> period time (can it?). That’s why we prefer the standard temporal syntax. >>>> >>>> The SQL standard doesn’t explicitly say the <point in time> must be a >>>> constant, just all the databases only support constant. But I think, it is >>>> very useful to support column reference. For example: >>>> >>>> SELECT * >>>> FROM Orders AS o >>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p >>>> ON o.productId = p.productId >>>> >>>> This is actually what we want. >>>> >>>> What do you think? >>>> >>>> [1] >>>> https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit# >>>> >>>> <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#> >>>> - Jark Wu >>>> >>>>> 在 2017年7月29日,上午2:16,Julian Hyde <[email protected]> 写道: >>>>> >>>>> '@' is specific to SQL Server, and a form of bind variable, so I don't >>>>> think we should go with that approach. >>>>> >>>>> The standard explicitly says "QSTPS shall not contain a <column >>>>> reference> or an <SQL parameter reference>." We may choose to ignore >>>>> that, but I think we can do what you want without bind variables: >>>>> >>>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY >>>>> >>>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its >>>>> value is constant throughout the execution of the query. So it behaves >>>>> somewhat like a bind variable.) >>>>> >>>>> Can we go with the syntax in the standard? (The Microsoft syntax >>>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I >>>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which >>>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a >>>>> timestamp literal.) >>>>> >>>>> The standard has this: >>>>> >>>>> <table primary> ::= >>>>> <table or query name> >>>>> [ <query system time period specification> ] >>>>> [ <correlation or recognition> ] >>>>> | ... >>>>> >>>>> <correlation or recognition> ::= >>>>> [ AS ] <correlation name> >>>>> [ <parenthesized derived column list> ] >>>>> | ... >>>>> >>>>> <query system time period specification> ::= >>>>> FOR SYSTEM_TIME AS OF <point in time 1> >>>>> | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ] >>>>> <point in time 1> AND <point in time 2> >>>>> | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2> >>>>> >>>>> <point in time 1> ::= >>>>> <point in time> >>>>> >>>>> <point in time 2> ::= >>>>> <point in time> >>>>> >>>>> <point in time> ::= >>>>> <datetime value expression> >>>>> >>>>> Julian >>>>> >>>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <[email protected]> >>>>> wrote: >>>>>> Hi Julian, >>>>>> >>>>>> Thanks for the valuable input. We have read “Streams, joins and temporal >>>>>> tables” you proposed before I started the discussion. The “temporal table >>>>>> view” proposed in the doc is very interesting and well-defined. But we >>>>>> hope >>>>>> to support this feature based on the standard SQL semantics. >>>>>> >>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that >>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but >>>>>> you couldn't choose for table A to be at timestamp X and table B to be >>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be >>>>>> constant, whereas we require the timestamp to be automatically >>>>>> varying. >>>>>> >>>>>> >>>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in >>>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s >>>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s >>>>>> example[2], it >>>>>> follows after table not the whole query. And I also tried in SQL Server, >>>>>> the >>>>>> following SQL works well. The TemporalRates and TemporalRates2 are both >>>>>> temporal table with the same schema. The query chooses different >>>>>> timestamp >>>>>> of table TemporalRates and TemporalRates2: >>>>>> >>>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28 >>>>>> 07:49:07.5649523' AS T2 >>>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS >>>>>> T1 >>>>>> ON T1.id = T2.id; >>>>>> >>>>>> So I think the only controversial point is it only allows constant >>>>>> timestamp. Then I find the following example from SQL Server doc [3] : >>>>>> >>>>>> DECLARE @ADayAgo datetime2 >>>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime()) >>>>>> /*Comparison between two points in time for subset of rows*/ >>>>>> SELECT D_1_Ago.[DeptID], D.[DeptID], >>>>>> D_1_Ago.[DeptName], D.[DeptName], >>>>>> D_1_Ago.[SysStartTime], D.[SysStartTime], >>>>>> D_1_Ago.[SysEndTime], D.[SysEndTime] >>>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago >>>>>> JOIN [Department] AS D ON D_1_Ago.[DeptID] = [D].[DeptID] >>>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ; >>>>>> >>>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering >>>>>> whether it is acceptable to support LATERAL temporal to make the >>>>>> timestamp >>>>>> automatically varying. Which is very useful in many scenarios. >>>>>> >>>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql >>>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1 >>>>>> [3] >>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table >>>>>> >>>>>> - Jark Wu >>>>>> >>>>>> 在 2017年7月28日,上午9:03,Julian Hyde <[email protected]> 写道: >>>>>> >>>>>> Jark, >>>>>> >>>>>> One more thing. >>>>>> >>>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam, >>>>>> Calcite, & Flink”[1] now. It will be a couple of days before I have >>>>>> concrete >>>>>> feedback but I will say now that Tyler has been giving temporal joins a >>>>>> lot >>>>>> of thought, seems to have incorporated my thinking, and generally does >>>>>> excellent work. >>>>>> >>>>>> So, definitely give his document serious consideration, as I am. >>>>>> >>>>>> When we all agree that we have the concepts right, I think it likely >>>>>> that we >>>>>> can embrace the syntax and semantics of temporal support that were >>>>>> introduced in SQL:2011. It’s important that we stay within the SQL >>>>>> standard >>>>>> for areas that it already covers. And by the way, I added PERIOD support >>>>>> to >>>>>> Calcite a while ago because that looked useful and un-contraversial. >>>>>> >>>>>> Julian >>>>>> >>>>>> [1] http://s.apache.org/streaming-sql-spec >>>>>> >>>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715 >>>>>> >>>>>> >>>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <[email protected]> wrote: >>>>>> >>>>>> This looks very interesting. >>>>>> >>>>>> Are you familiar with the proposal I created in September, "Streams, >>>>>> joins and temporal tables"[1]? I started thinking about >>>>>> stream-to-table joins, where the tables where time-varying, and ended >>>>>> up with temporal database semantics. >>>>>> >>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that >>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but >>>>>> you couldn't choose for table A to be at timestamp X and table B to be >>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be >>>>>> constant, whereas we require the timestamp to be automatically >>>>>> varying. >>>>>> >>>>>> I think you have come to similar conclusions. I would like to hear how >>>>>> your proposal fits with mine. >>>>>> >>>>>> >>>>>> Julian >>>>>> >>>>>> [1] >>>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E >>>>>> >>>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> We are looking for the temporal support in Calcite, or so-called >>>>>> system-versioned temporal table. The temporal database feature was >>>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 >>>>>> [2]. A >>>>>> system-versioned temporal table is designed to keep a full history of >>>>>> data >>>>>> changes and allow easy point in time analysis. It is very useful for >>>>>> scenarios that require tracking history of data changes. >>>>>> >>>>>> Because system-versioned tables are intended primarily for tracking >>>>>> historical data changes, queries on system-versioned tables often tend >>>>>> to be >>>>>> concerned with retrieving the table content as of a given point in time >>>>>> or >>>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME >>>>>> AS >>>>>> OF syntactic extension for this specific purpose. For example, the >>>>>> following >>>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011: >>>>>> SELECT ENo,EName,Sys_Start,Sys_End >>>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00' >>>>>> >>>>>> In addition, we need the time expression can be a relational expression >>>>>> whose value is from another table. For example, the following query joins >>>>>> the Orders to the Prices as the price was at the order time: >>>>>> SELECT STREAM * >>>>>> FROM Orders AS o >>>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p >>>>>> ON o.productId = p.productId >>>>>> >>>>>> So I would like to introduce the syntactic extension in Calcite. What do >>>>>> you >>>>>> think about this? Any comments or suggestions are welcome! >>>>>> >>>>>> [1] >>>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf >>>>>> [2] >>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables >>>>>> [3] >>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table >>>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/ >>>>>> >>>>>> Bests, >>>>>> Jark Wu >>>>>> >>>>>> >>>> >> >
