Hi Julian Feinauer, Do share your thoughts on MATCH_RECOGNIZE operator suitability, http://mail-archives.apache.org/mod_mbox/calcite-dev/201807.mbox/%[email protected]%3e
Cheers, Mark. > On 22 Oct 2018, at 02:24, Julian Feinauer <[email protected]> > wrote: > > Dear calcite devs, > > I follow the project for a long time and love how calcite made it possible to > use SQL everywhere (have done several sql interfaces on top of specific file > formats myself). I also like the strong support for streaming SQL. > > The reason I'm writing this email is not only to give the project some love > but because we are thinking about a SQL "extension" which I think is not so > specific but could serve others as well in different use cases. > > In detail, we are working with Streams of Data from Devices (think of > industry 4.0). We read data, e.g., from PLCs (using the (incubating) Apache > PLC4X project where I contribute) and do analytics on them. The analysis > which are done there are pretty similar when working with traces from tests, > e.g., automotive test drives or from related industries. What all these > streams have in common is > * usually ordered by time > * elements of different groups of signals ("rows" from "tables") arrive > ordered by time but not with equal timestamps, e.g., time each second, other > quantities much more frequent > * "natural" join for these signal groups ("tables") is some kind of > interpolation (sample and hold, linear interpolation, splinces, ...) with > respect to (event-)time > * In some cases signal types are not known and can only be guessed based on > first value, e.g., on CAN there is no strict notion of "double" or "integer" > channels but rather there are integer base values + a conversion formula > (like a x + b) + possible lookup tables for "other" values (SNA, NULL, > DISABLED, ...) > > On the other hand the analysis we like to perform are often timestamps > * get timestamps where a condition becomes true > * boolean value toggled > * numeric value is above / below threshold > * signal change rate is above / below threshold > * ... > * get the values of certain signals at the point in time when a condition > becomes true (see above) > * get windows based on conditions > * while signal is true > * while value above ... > * ... > * Do aggregations on signals in the mentioned windows > > Parts of this could done in most SQL dialects (I'm no expert for the standard > but in Postgres one could use LAG and partitions) but this is not efficient > and not all of the above could be done with that. > So we think about an extension (or a dialect) for "traces" or "time series" > which has a syntax that is slightly extended to allow such queries as stated > above. > > To give you an example of what such an extension could look like: > > ``` > SELECT start(), end(), MAX(current) FROM s7://127.0.0.1/0/0 WHILE > cycle_in_progress = TRUE > SELECT timestamp, current AS start_current FROM s7://127.0.0.1/0/0 WHERE > cycle_in_progress = TRUE TRIGGER ON_BECOME_TRUE > SELECT timestamp, current AS start_current FROM s7://127.0.0.1/0/0 WHERE > cycle_in_progress = TRUE TRIGGER ON_BECOME_TRUE > ``` > > Why am I bothering you with this? > Well, first, you are experts and I would love to get some feedback on > thoughts of that. > But, most important, I am thinking about writing (yet another) SQL parser > with slight extensions and would then have to care for a "runtime" which > would be partially similar (in functionality, not in maturity or > sophistication) to Calcites Enumerable-Trait. So I was thinking whether there > is a way to make all of this work "on top" of Calcite (custom RelNodes and an > extension to the parser) but I'm unsure about that as some of the internals > of Calcite are tied very specifically to Sql... like, e.g., SqlToRelConverter. > Do you have any ideas on how one would be able to implement this "minimaly > invasive" on top of Calcite and whether this is possible "ex-situ" or if this > should then be done in the same codebase (e.g. a subproject) as it would need > some changes near Calcites core? > > Please excuse this rather long email but I would really appreciate any > answers, comments or suggestions. > > Best > Julian
