Hello all

Today was the first day at ApacheCon North America. Among the various
presentation, one attracted especially my attention:


      Streaming SQL with Apache Calcite

We mentioned in previous emails the possibility to use Calcite as the
SQL parser for our DataStores like ShapeFiles. The presentation that I
saw today increase Calcite attractiveness, by opening possibilities to
couple such DataStores with e.g. SensorML.

The presentations reminded some SQL advantages, include: to tell what we
want rather than how to get it (we let the query optimizer figure out
the "how"), and to allow some changes on data structures and indexes
without impacting the SQL statements.

Apache Calcite propose an extension to the SQL language: the "SELECT
STREAM" statement. Compare a classical statement in which "Sensors" is a
table:

    SELECT * FROM "Sensors" WHERE altitude < 20;

Now consider a case where  "Temperature" is a stream. Contrarily to the
above classical case, the query below never terminates if new
temperature data are continuously arriving:

    SELECT STREAM * FROM "Temperature" WHERE value > 20;

(we can see streams as "Data in flight" and tables as "Data at rest")

Calcite can use stream as a table and table as a stream. Actually
"Temperatures" is both - where to actually find the data is up to the
system. An example of the advantage of using both as stream and as table
is to get the temperature that are greater than the average temperature
of previous year.

It is possible to use JOIN between stream and table (e.g. between
"Temperature" and "Sensors"); the result is a stream. The table may be
changed during stream execution. But JOIN between two streams is more
challenging.

Calcite provides Window functions that can be used together with GROUP
BY for computing values based on neighbouring rows. Example: "For every
records, emit the average for the surrounding T seconds".

    Martin


Reply via email to