Hi Julian,
We are interested in both and I have logged CALCITE-1912 for supporting “FOR
SYSTEM_TIME AS OF” in regular queries.
Regarding to (b), you mean it can be covered by standard. I agree, the query I
posted in the previous mail can be rewrote as a subquery:
SELECT *
FROM Orders AS o
JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime AND
sysEnd > O.orderTime) AS P
ON o.productId = p.productId
But subquery is too complex for users, the period condition is trivial for
users. Using the “FOR SYSTEM_TIME AS OF” is much simpler. Yes, you are right
that the standard says it can’t contain a column reference. We initialize this
discuss as we would like to "extend" the standard to simplify such query. What
do you think?
- Jark Wu
> 在 2017年8月1日,上午2:58,Julian Hyde <[email protected]> 写道:
>
> Can you clarify whether you are interested in (a) streaming queries, (b)
> temporal non-streaming queries, or both? If (b), it is covered by the
> standard, and we should follow the standard, period. If (a), syntax and
> semantics are still under discussion.
>
> If you want both, could we perhaps separate the streaming and non-streaming
> cases? Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in
> regular (non-streaming) queries?
>
>> The SQL standard doesn’t explicitly say the <point in time> must be a
>> constant, just all the databases only support constant.
>
> I disagree. The standard says that it cannot contain a column reference:
>
>> If <query system time period specification> QSTPS is specified, then:
>> a) The table identified by the <table or query name> shall be a
>> system-versioned table.
>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified,
>> then ASYMMETRIC is implicit.
>> c) QSTPS shall not contain a <column reference> or an <SQL parameter
>> reference>.
>
> We can go beyond the standard, but let’s be clear what it says.
>
> Julian
>
>
>
>
>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <[email protected]> wrote:
>>
>> Hi Julian,
>>
>> The syntax you posted looks very good to me. Besides of that, we hope to
>> extend the <point in time> to support column reference(i.e. LATERAL +
>> temporal syntax).
>>
>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It
>> is still a query constant, but what we want is a variable timestamp changing
>> with every record. The example in “Streams, joins and temporal tables” [1]
>> fits our requirement much more. It uses the table-value function to express
>> and not introduce new syntax. But it it is not a standard to manipulate
>> temporal table. And it can only express querying the table content as of a
>> specified point in time, but can not express a period time (can it?). That’s
>> why we prefer the standard temporal syntax.
>>
>> The SQL standard doesn’t explicitly say the <point in time> must be a
>> constant, just all the databases only support constant. But I think, it is
>> very useful to support column reference. For example:
>>
>> SELECT *
>> FROM Orders AS o
>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>> ON o.productId = p.productId
>>
>> This is actually what we want.
>>
>> What do you think?
>>
>> [1]
>> https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#
>>
>> <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
>> - Jark Wu
>>
>>> 在 2017年7月29日,上午2:16,Julian Hyde <[email protected]> 写道:
>>>
>>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>>> think we should go with that approach.
>>>
>>> The standard explicitly says "QSTPS shall not contain a <column
>>> reference> or an <SQL parameter reference>." We may choose to ignore
>>> that, but I think we can do what you want without bind variables:
>>>
>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>>>
>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>>> value is constant throughout the execution of the query. So it behaves
>>> somewhat like a bind variable.)
>>>
>>> Can we go with the syntax in the standard? (The Microsoft syntax
>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>>> timestamp literal.)
>>>
>>> The standard has this:
>>>
>>> <table primary> ::=
>>> <table or query name>
>>> [ <query system time period specification> ]
>>> [ <correlation or recognition> ]
>>> | ...
>>>
>>> <correlation or recognition> ::=
>>> [ AS ] <correlation name>
>>> [ <parenthesized derived column list> ]
>>> | ...
>>>
>>> <query system time period specification> ::=
>>> FOR SYSTEM_TIME AS OF <point in time 1>
>>> | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>> <point in time 1> AND <point in time 2>
>>> | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>>>
>>> <point in time 1> ::=
>>> <point in time>
>>>
>>> <point in time 2> ::=
>>> <point in time>
>>>
>>> <point in time> ::=
>>> <datetime value expression>
>>>
>>> Julian
>>>
>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <[email protected]> wrote:
>>>> Hi Julian,
>>>>
>>>> Thanks for the valuable input. We have read “Streams, joins and temporal
>>>> tables” you proposed before I started the discussion. The “temporal table
>>>> view” proposed in the doc is very interesting and well-defined. But we hope
>>>> to support this feature based on the standard SQL semantics.
>>>>
>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>> constant, whereas we require the timestamp to be automatically
>>>> varying.
>>>>
>>>>
>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in
>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2],
>>>> it
>>>> follows after table not the whole query. And I also tried in SQL Server,
>>>> the
>>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>>> temporal table with the same schema. The query chooses different timestamp
>>>> of table TemporalRates and TemporalRates2:
>>>>
>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>>> 07:49:07.5649523' AS T2
>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>>>> T1
>>>> ON T1.id = T2.id;
>>>>
>>>> So I think the only controversial point is it only allows constant
>>>> timestamp. Then I find the following example from SQL Server doc [3] :
>>>>
>>>> DECLARE @ADayAgo datetime2
>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>>> /*Comparison between two points in time for subset of rows*/
>>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>>> D_1_Ago.[DeptName], D.[DeptName],
>>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>>> JOIN [Department] AS D ON D_1_Ago.[DeptID] = [D].[DeptID]
>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>>>
>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>>> automatically varying. Which is very useful in many scenarios.
>>>>
>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>>> [3]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>
>>>> - Jark Wu
>>>>
>>>> 在 2017年7月28日,上午9:03,Julian Hyde <[email protected]> 写道:
>>>>
>>>> Jark,
>>>>
>>>> One more thing.
>>>>
>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>>> Calcite, & Flink”[1] now. It will be a couple of days before I have
>>>> concrete
>>>> feedback but I will say now that Tyler has been giving temporal joins a lot
>>>> of thought, seems to have incorporated my thinking, and generally does
>>>> excellent work.
>>>>
>>>> So, definitely give his document serious consideration, as I am.
>>>>
>>>> When we all agree that we have the concepts right, I think it likely that
>>>> we
>>>> can embrace the syntax and semantics of temporal support that were
>>>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>>>> for areas that it already covers. And by the way, I added PERIOD support to
>>>> Calcite a while ago because that looked useful and un-contraversial.
>>>>
>>>> Julian
>>>>
>>>> [1] http://s.apache.org/streaming-sql-spec
>>>>
>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>>>
>>>>
>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <[email protected]> wrote:
>>>>
>>>> This looks very interesting.
>>>>
>>>> Are you familiar with the proposal I created in September, "Streams,
>>>> joins and temporal tables"[1]? I started thinking about
>>>> stream-to-table joins, where the tables where time-varying, and ended
>>>> up with temporal database semantics.
>>>>
>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>> constant, whereas we require the timestamp to be automatically
>>>> varying.
>>>>
>>>> I think you have come to similar conclusions. I would like to hear how
>>>> your proposal fits with mine.
>>>>
>>>>
>>>> Julian
>>>>
>>>> [1]
>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>>>
>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <[email protected]>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> We are looking for the temporal support in Calcite, or so-called
>>>> system-versioned temporal table. The temporal database feature was
>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A
>>>> system-versioned temporal table is designed to keep a full history of data
>>>> changes and allow easy point in time analysis. It is very useful for
>>>> scenarios that require tracking history of data changes.
>>>>
>>>> Because system-versioned tables are intended primarily for tracking
>>>> historical data changes, queries on system-versioned tables often tend to
>>>> be
>>>> concerned with retrieving the table content as of a given point in time or
>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>>>> OF syntactic extension for this specific purpose. For example, the
>>>> following
>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>> SELECT ENo,EName,Sys_Start,Sys_End
>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>>>
>>>> In addition, we need the time expression can be a relational expression
>>>> whose value is from another table. For example, the following query joins
>>>> the Orders to the Prices as the price was at the order time:
>>>> SELECT STREAM *
>>>> FROM Orders AS o
>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>> ON o.productId = p.productId
>>>>
>>>> So I would like to introduce the syntactic extension in Calcite. What do
>>>> you
>>>> think about this? Any comments or suggestions are welcome!
>>>>
>>>> [1]
>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>>> [2]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>>> [3]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>>>
>>>> Bests,
>>>> Jark Wu
>>>>
>>>>
>>