http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_docs/stream.md ---------------------------------------------------------------------- diff --git a/avatica/site/_docs/stream.md b/avatica/site/_docs/stream.md deleted file mode 100644 index f66fbce..0000000 --- a/avatica/site/_docs/stream.md +++ /dev/null @@ -1,1023 +0,0 @@ ---- -layout: docs -title: Streaming -permalink: /docs/stream.html ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -Calcite has extended SQL and relational algebra in order to support -streaming queries. - -* TOC -{:toc} - -## Introduction - -Streams are collections to records that flow continuously, and forever. -Unlike tables, they are not typically stored on disk, but flow over the -network and are held for short periods of time in memory. - -Streams complement tables because they represent what is happening in the -present and future of the enterprise whereas tables represent the past. -It is very common for a stream to be archived into a table. - -Like tables, you often want to query streams in a high-level language -based on relational algebra, validated according to a schema, and optimized -to take advantage of available resources and algorithms. - -Calcite's SQL is an extension to standard SQL, not another 'SQL-like' language. -The distinction is important, for several reasons: - -* Streaming SQL is easy to learn for anyone who knows regular SQL. -* The semantics are clear, because we aim to produce the same results on a - stream as if the same data were in a table. -* You can write queries that combine streams and tables (or the history of - a stream, which is basically an in-memory table). -* Lots of existing tools can generate standard SQL. - -If you don't use the `STREAM` keyword, you are back in regular -standard SQL. - -## An example schema - -Our streaming SQL examples use the following schema: - -* `Orders (rowtime, productId, orderId, units)` - a stream and a table -* `Products (rowtime, productId, name)` - a table -* `Shipments (rowtime, orderId)` - a stream - -## A simple query - -Let's start with the simplest streaming query: - -{% highlight sql %} -SELECT STREAM * -FROM Orders; - - rowtime | productId | orderId | units -----------+-----------+---------+------- - 10:17:00 | 30 | 5 | 4 - 10:17:05 | 10 | 6 | 1 - 10:18:05 | 20 | 7 | 2 - 10:18:07 | 30 | 8 | 20 - 11:02:00 | 10 | 9 | 6 - 11:04:00 | 10 | 10 | 1 - 11:09:30 | 40 | 11 | 12 - 11:24:11 | 10 | 12 | 4 -{% endhighlight %} - -This query reads all columns and rows from the `Orders` stream. -Like any streaming query, it never terminates. It outputs a record whenever -a record arrives in `Orders`. - -Type `Control-C` to terminate the query. - -The `STREAM` keyword is the main extension in streaming SQL. It tells the -system that you are interested in incoming orders, not existing ones. The query - -{% highlight sql %} -SELECT * -FROM Orders; - - rowtime | productId | orderId | units -----------+-----------+---------+------- - 08:30:00 | 10 | 1 | 3 - 08:45:10 | 20 | 2 | 1 - 09:12:21 | 10 | 3 | 10 - 09:27:44 | 30 | 4 | 2 - -4 records returned. -{% endhighlight %} - -is also valid, but will print out all existing orders and then terminate. We -call it a *relational* query, as opposed to *streaming*. It has traditional -SQL semantics. - -`Orders` is special, in that it has both a stream and a table. If you try to run -a streaming query on a table, or a relational query on a stream, Calcite gives -an error: - -{% highlight sql %} -SELECT * FROM Shipments; - -ERROR: Cannot convert stream 'SHIPMENTS' to a table - -SELECT STREAM * FROM Products; - -ERROR: Cannot convert table 'PRODUCTS' to a stream -{% endhighlight %} - -# Filtering rows - -Just as in regular SQL, you use a `WHERE` clause to filter rows: - -{% highlight sql %} -SELECT STREAM * -FROM Orders -WHERE units > 3; - - rowtime | productId | orderId | units -----------+-----------+---------+------- - 10:17:00 | 30 | 5 | 4 - 10:18:07 | 30 | 8 | 20 - 11:02:00 | 10 | 9 | 6 - 11:09:30 | 40 | 11 | 12 - 11:24:11 | 10 | 12 | 4 -{% endhighlight %} - -# Projecting expressions - -Use expressions in the `SELECT` clause to choose which columns to return or -compute expressions: - -{% highlight sql %} -SELECT STREAM rowtime, - 'An order for ' || units || ' ' - || CASE units WHEN 1 THEN 'unit' ELSE 'units' END - || ' of product #' || productId AS description -FROM Orders; - - rowtime | description -----------+--------------------------------------- - 10:17:00 | An order for 4 units of product #30 - 10:17:05 | An order for 1 unit of product #10 - 10:18:05 | An order for 2 units of product #20 - 10:18:07 | An order for 20 units of product #30 - 11:02:00 | An order by 6 units of product #10 - 11:04:00 | An order by 1 unit of product #10 - 11:09:30 | An order for 12 units of product #40 - 11:24:11 | An order by 4 units of product #10 -{% endhighlight %} - -We recommend that you always include the `rowtime` column in the `SELECT` -clause. Having a sorted timestamp in each stream and streaming query makes it -possible to do advanced calculations later, such as `GROUP BY` and `JOIN`. - -# Tumbling windows - -There are several ways to compute aggregate functions on streams. The -differences are: - -* How many rows come out for each row in? -* Does each incoming value appear in one total, or more? -* What defines the "window", the set of rows that contribute to a given output row? -* Is the result a stream or a relation? - -There are various window types: - -* tumbling window (GROUP BY) -* hopping window (multi GROUP BY) -* sliding window (window functions) -* cascading window (window functions) - -and the following diagram shows the kinds of query in which to use them: - - - -First we'll look a *tumbling window*, which is defined by a streaming -`GROUP BY`. Here is an example: - -{% highlight sql %} -SELECT STREAM CEIL(rowtime TO HOUR) AS rowtime, - productId, - COUNT(*) AS c, - SUM(units) AS units -FROM Orders -GROUP BY CEIL(rowtime TO HOUR), productId; - - rowtime | productId | c | units -----------+-----------+---------+------- - 11:00:00 | 30 | 2 | 24 - 11:00:00 | 10 | 1 | 1 - 11:00:00 | 20 | 1 | 7 - 12:00:00 | 10 | 3 | 11 - 12:00:00 | 40 | 1 | 12 -{% endhighlight %} - -The result is a stream. At 11 o'clock, Calcite emits a sub-total for every -`productId` that had an order since 10 o'clock, timestamped 11 o'clock. -At 12 o'clock, it will emit -the orders that occurred between 11:00 and 12:00. Each input row contributes to -only one output row. - -How did Calcite know that the 10:00:00 sub-totals were complete at 11:00:00, -so that it could emit them? It knows that `rowtime` is increasing, and it knows -that `CEIL(rowtime TO HOUR)` is also increasing. So, once it has seen a row -at or after 11:00:00, it will never see a row that will contribute to a 10:00:00 -total. - -A column or expression that is increasing or decreasing is said to be -*monotonic*. - -If column or expression has values that are slightly out of order, -and the stream has a mechanism (such as punctuation or watermarks) -to declare that a particular value will never be seen again, then -the column or expression is said to be *quasi-monotonic*. - -Without a monotonic or quasi-monotonic expression in the `GROUP BY` clause, -Calcite is -not able to make progress, and it will not allow the query: - -{% highlight sql %} -SELECT STREAM productId, - COUNT(*) AS c, - SUM(units) AS units -FROM Orders -GROUP BY productId; - -ERROR: Streaming aggregation requires at least one monotonic expression in GROUP BY clause -{% endhighlight %} - -Monotonic and quasi-monotonic columns need to be declared in the schema. -The monotonicity is -enforced when records enter the stream and assumed by queries that read from -that stream. We recommend that you give each stream a timestamp column called -`rowtime`, but you can declare others to be monotonic, `orderId`, for example. - -We discuss punctuation, watermarks, and other ways of making progress -<a href="#punctuation">below</a>. - -# Tumbling windows, improved - -The previous example of tumbling windows was easy to write because the window -was one hour. For intervals that are not a whole time unit, say 2 hours or -2 hours and 17 minutes, you cannot use `CEIL`, and the expression gets more -complicated. - -Calcite supports an alternative syntax for tumbling windows: - -{% highlight sql %} -SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, - productId, - COUNT(*) AS c, - SUM(units) AS units -FROM Orders -GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId; - - rowtime | productId | c | units -----------+-----------+---------+------- - 11:00:00 | 30 | 2 | 24 - 11:00:00 | 10 | 1 | 1 - 11:00:00 | 20 | 1 | 7 - 12:00:00 | 10 | 3 | 11 - 12:00:00 | 40 | 1 | 12 -{% endhighlight %} - -As you can see, it returns the same results as the previous query. The `TUMBLE` -function returns a grouping key that is the same for all the rows that will end -up in a given summary row; the `TUMBLE_END` function takes the same arguments -and returns the time at which that window ends; -there is also a `TUMBLE_START` function. - -`TUMBLE` has an optional parameter to align the window. -In the following example, -we use a 30 minute interval and 0:12 as the alignment time, -so the query emits summaries at 12 and 42 minutes past each hour: - -{% highlight sql %} -SELECT STREAM - TUMBLE_END(rowtime, INTERVAL '30' MINUTE, TIME '0:12') AS rowtime, - productId, - COUNT(*) AS c, - SUM(units) AS units -FROM Orders -GROUP BY TUMBLE(rowtime, INTERVAL '30' MINUTE, TIME '0:12'), - productId; - - rowtime | productId | c | units -----------+-----------+---------+------- - 10:42:00 | 30 | 2 | 24 - 10:42:00 | 10 | 1 | 1 - 10:42:00 | 20 | 1 | 7 - 11:12:00 | 10 | 2 | 7 - 11:12:00 | 40 | 1 | 12 - 11:42:00 | 10 | 1 | 4 -{% endhighlight %} - -# Hopping windows - -Hopping windows are a generalization of tumbling windows that allow data to -be kept in a window for a longer than the emit interval. - -For example, the following query emits a row timestamped 11:00 containing data -from 08:00 to 11:00 (or 10:59.9 if we're being pedantic), -and a row timestamped 12:00 containing data from 09:00 -to 12:00. - -{% highlight sql %} -SELECT STREAM - HOP_END(rowtime, INTERVAL '1' HOUR, INTERVAL '3' HOUR) AS rowtime, - COUNT(*) AS c, - SUM(units) AS units -FROM Orders -GROUP BY HOP(rowtime, INTERVAL '1' HOUR, INTERVAL '3' HOUR); - - rowtime | c | units -----------+----------+------- - 11:00:00 | 4 | 27 - 12:00:00 | 8 | 50 -{% endhighlight %} - -In this query, because the retain period is 3 times the emit period, every input -row contributes to exactly 3 output rows. Imagine that the `HOP` function -generates a collection of group keys for incoming row, and places its values -in the accumulators of each of those group keys. For example, -`HOP(10:18:00, INTERVAL '1' HOUR, INTERVAL '3')` generates 3 periods - -```[08:00, 09:00) -[09:00, 10:00) -[10:00, 11:00) -``` - -This raises the possibility of allowing user-defined partitioning functions -for users who are not happy with the built-in functions `HOP` and `TUMBLE`. - -We can build complex complex expressions such as an exponentially decaying -moving average: - -{% highlight sql %} -SELECT STREAM HOP_END(rowtime), - productId, - SUM(unitPrice * EXP((rowtime - HOP_START(rowtime)) SECOND / INTERVAL '1' HOUR)) - / SUM(EXP((rowtime - HOP_START(rowtime)) SECOND / INTERVAL '1' HOUR)) -FROM Orders -GROUP BY HOP(rowtime, INTERVAL '1' SECOND, INTERVAL '1' HOUR), - productId -{% endhighlight %} - -Emits: - -* a row at `11:00:00` containing rows in `[10:00:00, 11:00:00)`; -* a row at `11:00:01` containing rows in `[10:00:01, 11:00:01)`. - -The expression weighs recent orders more heavily than older orders. -Extending the window from 1 hour to 2 hours or 1 year would have -virtually no effect on the accuracy of the result (but use more memory -and compute). - -Note that we use `HOP_START` inside an aggregate function (`SUM`) because it -is a value that is constant for all rows within a sub-total. This -would not be allowed for typical aggregate functions (`SUM`, `COUNT` -etc.). - -If you are familiar with `GROUPING SETS`, you may notice that partitioning -functions can be seen as a generalization of `GROUPING SETS`, in that they -allow an input row to contribute to multiple sub-totals. -The auxiliary functions for `GROUPING SETS`, -such as `GROUPING()` and `GROUP_ID`, -can be used inside aggregate functions, so it is not surprising that -`HOP_START` and `HOP_END` can be used in the same way. - -# GROUPING SETS - -`GROUPING SETS` is valid for a streaming query provided that every -grouping set contains a monotonic or quasi-monotonic expression. - -`CUBE` and `ROLLUP` are not valid for streaming query, because they will -produce at least one grouping set that aggregates everything (like -`GROUP BY ()`). - -# Filtering after aggregation - -As in standard SQL, you can apply a `HAVING` clause to filter rows emitted by -a streaming `GROUP BY`: - -{% highlight sql %} -SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, - productId -FROM Orders -GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId -HAVING COUNT(*) > 2 OR SUM(units) > 10; - - rowtime | productId -----------+----------- - 10:00:00 | 30 - 11:00:00 | 10 - 11:00:00 | 40 -{% endhighlight %} - -# Sub-queries, views and SQL's closure property - -The previous `HAVING` query can be expressed using a `WHERE` clause on a -sub-query: - -{% highlight sql %} -SELECT STREAM rowtime, productId -FROM ( - SELECT TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, - productId, - COUNT(*) AS c, - SUM(units) AS su - FROM Orders - GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId) -WHERE c > 2 OR su > 10; - - rowtime | productId -----------+----------- - 10:00:00 | 30 - 11:00:00 | 10 - 11:00:00 | 40 -{% endhighlight %} - -`HAVING` was introduced in the early days of SQL, when a way was needed to -perform a filter *after* aggregation. (Recall that `WHERE` filters rows before -they enter the `GROUP BY` clause.) - -Since then, SQL has become a mathematically closed language, which means that -any operation you can perform on a table can also perform on a query. - -The *closure property* of SQL is extremely powerful. Not only does it render -`HAVING` obsolete (or, at least, reduce it to syntactic sugar), it makes views -possible: - -{% highlight sql %} -CREATE VIEW HourlyOrderTotals (rowtime, productId, c, su) AS - SELECT TUMBLE_END(rowtime, INTERVAL '1' HOUR), - productId, - COUNT(*), - SUM(units) - FROM Orders - GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId; - -SELECT STREAM rowtime, productId -FROM HourlyOrderTotals -WHERE c > 2 OR su > 10; - - rowtime | productId -----------+----------- - 10:00:00 | 30 - 11:00:00 | 10 - 11:00:00 | 40 -{% endhighlight %} - -Sub-queries in the `FROM` clause are sometimes referred to as "inline views", -but really, they are more fundamental than views. Views are just a convenient -way to carve your SQL into manageable chunks by giving the pieces names and -storing them in the metadata repository. - -Many people find that nested queries and views are even more useful on streams -than they are on relations. Streaming queries are pipelines of -operators all running continuously, and often those pipelines get quite long. -Nested queries and views help to express and manage those pipelines. - -And, by the way, a `WITH` clause can accomplish the same as a sub-query or -a view: - -{% highlight sql %} -WITH HourlyOrderTotals (rowtime, productId, c, su) AS ( - SELECT TUMBLE_END(rowtime, INTERVAL '1' HOUR), - productId, - COUNT(*), - SUM(units) - FROM Orders - GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId) -SELECT STREAM rowtime, productId -FROM HourlyOrderTotals -WHERE c > 2 OR su > 10; - - rowtime | productId -----------+----------- - 10:00:00 | 30 - 11:00:00 | 10 - 11:00:00 | 40 -{% endhighlight %} - -# Converting between streams and relations - -Look back at the definition of the `HourlyOrderTotals` view. -Is the view a stream or a relation? - -It does not contain the `STREAM` keyword, so it is a relation. -However, it is a relation that can be converted into a stream. - -You can use it in both relational and streaming queries: - -{% highlight sql %} -# A relation; will query the historic Orders table. -# Returns the largest number of product #10 ever sold in one hour. -SELECT max(su) -FROM HourlyOrderTotals -WHERE productId = 10; - -# A stream; will query the Orders stream. -# Returns every hour in which at least one product #10 was sold. -SELECT STREAM rowtime -FROM HourlyOrderTotals -WHERE productId = 10; -{% endhighlight %} - -This approach is not limited to views and sub-queries. -Following the approach set out in CQL [<a href="#ref1">1</a>], every query -in streaming SQL is defined as a relational query and converted to a stream -using the `STREAM` keyword in the top-most `SELECT`. - -If the `STREAM` keyword is present in sub-queries or view definitions, it has no -effect. - -At query preparation time, Calcite figures out whether the relations referenced -in the query can be converted to streams or historical relations. - -Sometimes a stream makes available some of its history (say the last 24 hours of -data in an Apache Kafka [<a href="#ref2">2</a>] topic) -but not all. At run time, Calcite figures out whether there is sufficient -history to run the query, and if not, gives an error. - -# The "pie chart" problem: Relational queries on streams - -One particular case where you need to convert a stream to a relation -occurs in what I call the "pie chart problem". Imagine that you need to -write a web page with a chart, like the following, that summarizes the -number of orders for each product over the last hour. - - - -But the `Orders` stream only contains a few records, not an hour's summary. -We need to run a relational query on the history of the stream: - -{% highlight sql %} -SELECT productId, count(*) -FROM Orders -WHERE rowtime BETWEEN current_timestamp - INTERVAL '1' HOUR - AND current_timestamp; -{% endhighlight %} - -If the history of the `Orders` stream is being spooled to the `Orders` table, -we can answer the query, albeit at a high cost. Better, if we can tell the -system to materialize one hour summary into a table, -maintain it continuously as the stream flows, -and automatically rewrite queries to use the table. - -# Sorting - -The story for `ORDER BY` is similar to `GROUP BY`. -The syntax looks like regular SQL, but Calcite must be sure that it can deliver -timely results. It therefore requires a monotonic expression on the leading edge -of your `ORDER BY` key. - -{% highlight sql %} -SELECT STREAM CEIL(rowtime TO hour) AS rowtime, productId, orderId, units -FROM Orders -ORDER BY CEIL(rowtime TO hour) ASC, units DESC; - - rowtime | productId | orderId | units -----------+-----------+---------+------- - 10:00:00 | 30 | 8 | 20 - 10:00:00 | 30 | 5 | 4 - 10:00:00 | 20 | 7 | 2 - 10:00:00 | 10 | 6 | 1 - 11:00:00 | 40 | 11 | 12 - 11:00:00 | 10 | 9 | 6 - 11:00:00 | 10 | 12 | 4 - 11:00:00 | 10 | 10 | 1 -{% endhighlight %} - -Most queries will return results in the order that they were inserted, -because the engine is using streaming algorithms, but you should not rely on it. -For example, consider this: - -{% highlight sql %} -SELECT STREAM * -FROM Orders -WHERE productId = 10 -UNION ALL -SELECT STREAM * -FROM Orders -WHERE productId = 30; - - rowtime | productId | orderId | units -----------+-----------+---------+------- - 10:17:05 | 10 | 6 | 1 - 10:17:00 | 30 | 5 | 4 - 10:18:07 | 30 | 8 | 20 - 11:02:00 | 10 | 9 | 6 - 11:04:00 | 10 | 10 | 1 - 11:24:11 | 10 | 12 | 4 -{% endhighlight %} - -The rows with `productId` = 30 are apparently out of order, probably because -the `Orders` stream was partitioned on `productId` and the partitioned streams -sent their data at different times. - -If you require a particular ordering, add an explicit `ORDER BY`: - -{% highlight sql %} -SELECT STREAM * -FROM Orders -WHERE productId = 10 -UNION ALL -SELECT STREAM * -FROM Orders -WHERE productId = 30 -ORDER BY rowtime; - - rowtime | productId | orderId | units -----------+-----------+---------+------- - 10:17:00 | 30 | 5 | 4 - 10:17:05 | 10 | 6 | 1 - 10:18:07 | 30 | 8 | 20 - 11:02:00 | 10 | 9 | 6 - 11:04:00 | 10 | 10 | 1 - 11:24:11 | 10 | 12 | 4 -{% endhighlight %} - -Calcite will probably implement the `UNION ALL` by merging using `rowtime`, -which is only slightly less efficient. - -You only need to add an `ORDER BY` to the outermost query. If you need to, -say, perform `GROUP BY` after a `UNION ALL`, Calcite will add an `ORDER BY` -implicitly, in order to make the GROUP BY algorithm possible. - -# Table constructor - -The `VALUES` clause creates an inline table with a given set of rows. - -Streaming is disallowed. The set of rows never changes, and therefore a stream -would never return any rows. - -{% highlight sql %} -> SELECT STREAM * FROM (VALUES (1, 'abc')); - -ERROR: Cannot stream VALUES -{% endhighlight %} - -# Sliding windows - -Standard SQL features so-called "analytic functions" that can be used in the -`SELECT` clause. Unlike `GROUP BY`, these do not collapse records. For each -record that goes in, one record comes out. But the aggregate function is based -on a window of many rows. - -Let's look at an example. - -{% highlight sql %} -SELECT STREAM rowtime, - productId, - units, - SUM(units) OVER (ORDER BY rowtime RANGE INTERVAL '1' HOUR PRECEDING) unitsLastHour -FROM Orders; -{% endhighlight %} - -The feature packs a lot of power with little effort. You can have multiple -functions in the `SELECT` clause, based on multiple window specifications. - -The following example returns orders whose average order size over the last -10 minutes is greater than the average order size for the last week. - -{% highlight sql %} -SELECT STREAM * -FROM ( - SELECT STREAM rowtime, - productId, - units, - AVG(units) OVER product (RANGE INTERVAL '10' MINUTE PRECEDING) AS m10, - AVG(units) OVER product (RANGE INTERVAL '7' DAY PRECEDING) AS d7 - FROM Orders - WINDOW product AS ( - ORDER BY rowtime - PARTITION BY productId)) -WHERE m10 > d7; -{% endhighlight %} - -For conciseness, here we use a syntax where you partially define a window -using a `WINDOW` clause and then refine the window in each `OVER` clause. -You could also define all windows in the `WINDOW` clause, or all windows inline, -if you wish. - -But the real power goes beyond syntax. Behind the scenes, this query is -maintaining two tables, and adding and removing values from sub-totals using -with FIFO queues. But you can access those tables without introducing a join -into the query. - -Some other features of the windowed aggregation syntax: - -* You can define windows based on row count. -* The window can reference rows that have not yet arrived. - (The stream will wait until they have arrived). -* You can compute order-dependent functions such as `RANK` and median. - -# Cascading windows - -What if we want a query that returns a result for every record, like a -sliding window, but resets totals on a fixed time period, like a -tumbling window? Such a pattern is called a *cascading window*. Here -is an example: - -{% highlight sql %} -SELECT STREAM rowtime, - productId, - units, - SUM(units) OVER (PARTITION BY FLOOR(rowtime TO HOUR)) AS unitsSinceTopOfHour -FROM Orders; -{% endhighlight %} - -It looks similar to a sliding window query, but the monotonic -expression occurs within the `PARTITION BY` clause of the window. As -the rowtime moves from from 10:59:59 to 11:00:00, -`FLOOR(rowtime TO HOUR)` changes from 10:00:00 to 11:00:00, -and therefore a new partition starts. -The first row to arrive in the new hour will start a -new total; the second row will have a total that consists of two rows, -and so on. - -Calcite knows that the old partition will never be used again, so -removes all sub-totals for that partition from its internal storage. - -Analytic functions that using cascading and sliding windows can be -combined in the same query. - -# Joining streams to tables - -There are two kinds of join where streams are concerned: stream-to-table -join and stream-to-stream join. - -A stream-to-table join is straightforward if the contents of the table -are not changing. This query enriches a stream of orders with -each product's list price: - -{% highlight sql %} -SELECT STREAM o.rowtime, o.productId, o.orderId, o.units, - p.name, p.unitPrice -FROM Orders AS o -JOIN Products AS p - ON o.productId = p.productId; - - rowtime | productId | orderId | units | name | unitPrice -----------+-----------+---------+-------+ -------+----------- - 10:17:00 | 30 | 5 | 4 | Cheese | 17 - 10:17:05 | 10 | 6 | 1 | Beer | 0.25 - 10:18:05 | 20 | 7 | 2 | Wine | 6 - 10:18:07 | 30 | 8 | 20 | Cheese | 17 - 11:02:00 | 10 | 9 | 6 | Beer | 0.25 - 11:04:00 | 10 | 10 | 1 | Beer | 0.25 - 11:09:30 | 40 | 11 | 12 | Bread | 100 - 11:24:11 | 10 | 12 | 4 | Beer | 0.25 -{% endhighlight %} - -What should happen if the table is changing? For example, -suppose the unit price of product 10 is increased to 0.35 at 11:00. -Orders placed before 11:00 should have the old price, and orders -placed after 11:00 should reflect the new price. - -One way to implement this is to have a table that keeps every version -with a start and end effective date, `ProductVersions` in the following -example: - -{% highlight sql %} -SELECT STREAM * -FROM Orders AS o -JOIN ProductVersions AS p - ON o.productId = p.productId - AND o.rowtime BETWEEN p.startDate AND p.endDate - - rowtime | productId | orderId | units | productId1 | name | unitPrice -----------+-----------+---------+-------+ -----------+--------+----------- - 10:17:00 | 30 | 5 | 4 | 30 | Cheese | 17 - 10:17:05 | 10 | 6 | 1 | 10 | Beer | 0.25 - 10:18:05 | 20 | 7 | 2 | 20 | Wine | 6 - 10:18:07 | 30 | 8 | 20 | 30 | Cheese | 17 - 11:02:00 | 10 | 9 | 6 | 10 | Beer | 0.35 - 11:04:00 | 10 | 10 | 1 | 10 | Beer | 0.35 - 11:09:30 | 40 | 11 | 12 | 40 | Bread | 100 - 11:24:11 | 10 | 12 | 4 | 10 | Beer | 0.35 -{% endhighlight %} - -The other way to implement this is to use a database with temporal support -(the ability to find the contents of the database as it was at any moment -in the past), and the system needs to know that the `rowtime` column of -the `Orders` stream corresponds to the transaction timestamp of the -`Products` table. - -For many applications, it is not worth the cost and effort of temporal -support or a versioned table. It is acceptable to the application that -the query gives different results when replayed: in this example, on replay, -all orders of product 10 are assigned the later unit price, 0.35. - -# Joining streams to streams - -It makes sense to join two streams if the join condition somehow forces -them to remain a finite distance from one another. In the following query, -the ship date is within one hour of the order date: - -{% highlight sql %} -SELECT STREAM o.rowtime, o.productId, o.orderId, s.rowtime AS shipTime -FROM Orders AS o -JOIN Shipments AS s - ON o.orderId = p.orderId - AND s.rowtime BETWEEN o.rowtime AND o.rowtime + INTERVAL '1' HOUR; - - rowtime | productId | orderId | shipTime -----------+-----------+---------+---------- - 10:17:00 | 30 | 5 | 10:55:00 - 10:17:05 | 10 | 6 | 10:20:00 - 11:02:00 | 10 | 9 | 11:58:00 - 11:24:11 | 10 | 12 | 11:44:00 -{% endhighlight %} - -Note that quite a few orders do not appear, because they did not ship -within an hour. By the time the system receives order 10, timestamped 11:24:11, -it has already removed orders up to and including order 8, timestamped 10:18:07, -from its hash table. - -As you can see, the "lock step", tying together monotonic or quasi-monotonic -columns of the two streams, is necessary for the system to make progress. -It will refuse to execute a query if it cannot deduce a lock step. - -# DML - -It's not only queries that make sense against streams; -it also makes sense to run DML statements (`INSERT`, `UPDATE`, `DELETE`, -and also their rarer cousins `UPSERT` and `REPLACE`) against streams. - -DML is useful because it allows you do materialize streams -or tables based on streams, -and therefore save effort when values are used often. - -Consider how streaming applications often consist of pipelines of queries, -each query transforming input stream(s) to output stream(s). -The component of a pipeline can be a view: - -{% highlight sql %} -CREATE VIEW LargeOrders AS -SELECT STREAM * FROM Orders WHERE units > 1000; -{% endhighlight %} - -or a standing `INSERT` statement: - -{% highlight sql %} -INSERT INTO LargeOrders -SELECT STREAM * FROM Orders WHERE units > 1000; -{% endhighlight %} - -These look similar, and in both cases the next step(s) in the pipeline -can read from `LargeOrders` without worrying how it was populated. -There is a difference in efficiency: the `INSERT` statement does the -same work no matter how many consumers there are; the view does work -proportional to the number of consumers, and in particular, does no -work if there are no consumers. - -Other forms of DML make sense for streams. For example, the following -standing `UPSERT` statement maintains a table that materializes a summary -of the last hour of orders: - -{% highlight sql %} -UPSERT INTO OrdersSummary -SELECT STREAM productId, - COUNT(*) OVER lastHour AS c -FROM Orders -WINDOW lastHour AS ( - PARTITION BY productId - ORDER BY rowtime - RANGE INTERVAL '1' HOUR PRECEDING) -{% endhighlight %} - -# Punctuation - -Punctuation[<a href="#ref5">5</a>] allows a stream query to make progress -even if there are not enough values in a monotonic key to push the results out. - -(I prefer the term "rowtime bounds", -and watermarks[<a href="#ref6">6</a>] are a related concept, -but for these purposes, punctuation will suffice.) - -If a stream has punctuation enabled then it may not be sorted but is -nevertheless sortable. So, for the purposes of semantics, it is sufficient -to work in terms of sorted streams. - -By the way, an out-of-order stream is also sortable if it is *t-sorted* -(i.e. every record is guaranteed to arrive within *t* seconds of its -timestamp) or *k-sorted* (i.e. every record is guaranteed to be no more -than *k* positions out of order). So queries on these streams can be -planned similarly to queries on streams with punctuation. - -And, we often want to aggregate over attributes that are not -time-based but are nevertheless monotonic. "The number of times a team -has shifted between winning-state and losing-state" is one such -monotonic attribute. The system needs to figure out for itself that it -is safe to aggregate over such an attribute; punctuation does not add -any extra information. - -I have in mind some metadata (cost metrics) for the planner: - -1. Is this stream sorted on a given attribute (or attributes)? -2. Is it possible to sort the stream on a given attribute? (For finite - relations, the answer is always "yes"; for streams it depends on the - existence of punctuation, or linkage between the attributes and the - sort key.) -3. What latency do we need to introduce in order to perform that sort? -4. What is the cost (in CPU, memory etc.) of performing that sort? - -We already have (1), in [BuiltInMetadata.Collation]({{ site.apiRoot }}/org/apache/calcite/rel/metadata/BuiltInMetadata.Collation.html). -For (2), the answer is always "true" for finite relations. -But we'll need to implement (2), (3) and (4) for streams. - -# State of the stream - -Not all concepts in this article have been implemented in Calcite. -And others may be implemented in Calcite but not in a particular adapter -such as SamzaSQL [<a href="#ref3">3</a>] [<a href="#ref4">4</a>]. - -## Implemented - -* Streaming `SELECT`, `WHERE`, `GROUP BY`, `HAVING`, `UNION ALL`, `ORDER BY` -* `FLOOR` and `CEIL` functions -* Monotonicity -* Streaming `VALUES` is disallowed - -## Not implemented - -The following features are presented in this document as if Calcite -supports them, but in fact it does not (yet). Full support means -that the reference implementation supports the feature (including -negative cases) and the TCK tests it. - -* Stream-to-stream `JOIN` -* Stream-to-table `JOIN` -* Stream on view -* Streaming `UNION ALL` with `ORDER BY` (merge) -* Relational query on stream -* Streaming windowed aggregation (sliding and cascading windows) -* Check that `STREAM` in sub-queries and views is ignored -* Check that streaming `ORDER BY` cannot have `OFFSET` or `LIMIT` -* Limited history; at run time, check that there is sufficient history - to run the query. -* [Quasi-monotonicity](https://issues.apache.org/jira/browse/CALCITE-1096) -* `HOP` and `TUMBLE` (and auxiliary `HOP_START`, `HOP_END`, - `TUMBLE_START`, `TUMBLE_END`) functions - -## To do in this document - -* Re-visit whether you can stream `VALUES` -* `OVER` clause to define window on stream -* Consider whether to allow `CUBE` and `ROLLUP` in streaming queries, - with an understanding that some levels of aggregation will never complete - (because they have no monotonic expressions) and thus will never be emitted. -* Fix the `UPSERT` example to remove records for products that have not - occurred in the last hour. -* DML that outputs to multiple streams; perhaps an extension to the standard - `REPLACE` statement. - -# Functions - -The following functions are not present in standard SQL -but are defined in streaming SQL. - -Scalar functions: - -* `FLOOR(dateTime TO intervalType)` rounds a date, time or timestamp value - down to a given interval type -* `CEIL(dateTime TO intervalType)` rounds a date, time or timestamp value - up to a given interval type - -Partitioning functions: - -* `HOP(t, emit, retain)` returns a collection of group keys for a row - to be part of a hopping window -* `HOP(t, emit, retain, align)` returns a collection of group keys for a row - to be part of a hopping window with a given alignment -* `TUMBLE(t, emit)` returns a group key for a row - to be part of a tumbling window -* `TUMBLE(t, emit, align)` returns a group key for a row - to be part of a tumbling window with a given alignment - -`TUMBLE(t, e)` is equivalent to `TUMBLE(t, e, TIME '00:00:00')`. - -`TUMBLE(t, e, a)` is equivalent to `HOP(t, e, e, a)`. - -`HOP(t, e, r)` is equivalent to `HOP(t, e, r, TIME '00:00:00')`. - -# References - -* [<a name="ref1">1</a>] - <a href="http://ilpubs.stanford.edu:8090/758/">Arvind Arasu, Shivnath Babu, - and Jennifer Widom (2003) The CQL Continuous Query - Language: Semantic Foundations and Query Execution</a>. -* [<a name="ref2">2</a>] - <a href="http://kafka.apache.org/documentation.html">Apache Kafka</a>. -* [<a name="ref3">3</a>] <a href="http://samza.apache.org">Apache Samza</a>. -* [<a name="ref4">4</a>] <a href="https://github.com/milinda/samza-sql">SamzaSQL</a>. -* [<a name="ref5">5</a>] - <a href="http://www.whitworth.edu/academic/department/mathcomputerscience/faculty/tuckerpeter/pdf/117896_final.pdf">Peter - A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras (2003) Exploiting - Punctuation Semantics in Continuous Data Streams</a>. -* [<a name="ref6">6</a>] - <a href="http://research.google.com/pubs/pub41378.html">Tyler Akidau, - Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, - Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle (2013) - MillWheel: Fault-Tolerant Stream Processing at Internet Scale</a>.
http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_docs/tutorial.md ---------------------------------------------------------------------- diff --git a/avatica/site/_docs/tutorial.md b/avatica/site/_docs/tutorial.md deleted file mode 100644 index 73f8f8f..0000000 --- a/avatica/site/_docs/tutorial.md +++ /dev/null @@ -1,760 +0,0 @@ ---- -layout: docs -title: Tutorial -permalink: /docs/tutorial.html ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -This is a step-by-step tutorial that shows how to build and connect to -Calcite. It uses a simple adapter that makes a directory of CSV files -appear to be a schema containing tables. Calcite does the rest, and -provides a full SQL interface. - -Calcite-example-CSV is a fully functional adapter for -Calcite that reads -text files in -<a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV -(comma-separated values)</a> format. It is remarkable that a couple of -hundred lines of Java code are sufficient to provide full SQL query -capability. - -CSV also serves as a template for building adapters to other -data formats. Even though there are not many lines of code, it covers -several important concepts: - -* user-defined schema using SchemaFactory and Schema interfaces; -* declaring schemas in a model JSON file; -* declaring views in a model JSON file; -* user-defined table using the Table interface; -* determining the record type of a table; -* a simple implementation of Table, using the ScannableTable interface, that - enumerates all rows directly; -* a more advanced implementation that implements FilterableTable, and can - filter out rows according to simple predicates; -* advanced implementation of Table, using TranslatableTable, that translates - to relational operators using planner rules. - -## Download and build - -You need Java (1.7 or higher; 1.8 preferred), git and maven (3.2.1 or later). - -{% highlight bash %} -$ git clone https://github.com/apache/calcite.git -$ cd calcite -$ mvn install -DskipTests -Dcheckstyle.skip=true -$ cd example/csv -{% endhighlight %} - -## First queries - -Now let's connect to Calcite using -<a href="https://github.com/julianhyde/sqlline">sqlline</a>, a SQL shell -that is included in this project. - -{% highlight bash %} -$ ./sqlline -sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin -{% endhighlight %} - -(If you are running Windows, the command is `sqlline.bat`.) - -Execute a metadata query: - -{% highlight bash %} -sqlline> !tables -+------------+--------------+-------------+---------------+----------+------+ -| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE | -+------------+--------------+-------------+---------------+----------+------+ -| null | SALES | DEPTS | TABLE | null | null | -| null | SALES | EMPS | TABLE | null | null | -| null | SALES | HOBBIES | TABLE | null | null | -| null | metadata | COLUMNS | SYSTEM_TABLE | null | null | -| null | metadata | TABLES | SYSTEM_TABLE | null | null | -+------------+--------------+-------------+---------------+----------+------+ -{% endhighlight %} - -(JDBC experts, note: sqlline's <code>!tables</code> command is just executing -<a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[])"><code>DatabaseMetaData.getTables()</code></a> -behind the scenes. -It has other commands to query JDBC metadata, such as <code>!columns</code> and <code>!describe</code>.) - -As you can see there are 5 tables in the system: tables -<code>EMPS</code>, <code>DEPTS</code> and <code>HOBBIES</code> in the current -<code>SALES</code> schema, and <code>COLUMNS</code> and -<code>TABLES</code> in the system <code>metadata</code> schema. The -system tables are always present in Calcite, but the other tables are -provided by the specific implementation of the schema; in this case, -the <code>EMPS</code> and <code>DEPTS</code> tables are based on the -<code>EMPS.csv</code> and <code>DEPTS.csv</code> files in the -<code>target/test-classes</code> directory. - -Let's execute some queries on those tables, to show that Calcite is providing -a full implementation of SQL. First, a table scan: - -{% highlight bash %} -sqlline> SELECT * FROM emps; -+--------+--------+---------+---------+----------------+--------+-------+---+ -| EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | S | -+--------+--------+---------+---------+----------------+--------+-------+---+ -| 100 | Fred | 10 | | | 30 | 25 | t | -| 110 | Eric | 20 | M | San Francisco | 3 | 80 | n | -| 110 | John | 40 | M | Vancouver | 2 | null | f | -| 120 | Wilma | 20 | F | | 1 | 5 | n | -| 130 | Alice | 40 | F | Vancouver | 2 | null | f | -+--------+--------+---------+---------+----------------+--------+-------+---+ -{% endhighlight %} - -Now JOIN and GROUP BY: - -{% highlight bash %} -sqlline> SELECT d.name, COUNT(*) -. . . .> FROM emps AS e JOIN depts AS d ON e.deptno = d.deptno -. . . .> GROUP BY d.name; -+------------+---------+ -| NAME | EXPR$1 | -+------------+---------+ -| Sales | 1 | -| Marketing | 2 | -+------------+---------+ -{% endhighlight %} - -Last, the VALUES operator generates a single row, and is a convenient -way to test expressions and SQL built-in functions: - -{% highlight bash %} -sqlline> VALUES CHAR_LENGTH('Hello, ' || 'world!'); -+---------+ -| EXPR$0 | -+---------+ -| 13 | -+---------+ -{% endhighlight %} - -Calcite has many other SQL features. We don't have time to cover them -here. Write some more queries to experiment. - -## Schema discovery - -Now, how did Calcite find these tables? Remember, core Calcite does not -know anything about CSV files. (As a "database without a storage -layer", Calcite doesn't know about any file formats.) Calcite knows about -those tables because we told it to run code in the calcite-example-csv -project. - -There are a couple of steps in that chain. First, we define a schema -based on a schema factory class in a model file. Then the schema -factory creates a schema, and the schema creates several tables, each -of which knows how to get data by scanning a CSV file. Last, after -Calcite has parsed the query and planned it to use those tables, Calcite -invokes the tables to read the data as the query is being -executed. Now let's look at those steps in more detail. - -On the JDBC connect string we gave the path of a model in JSON -format. Here is the model: - -{% highlight json %} -{ - version: '1.0', - defaultSchema: 'SALES', - schemas: [ - { - name: 'SALES', - type: 'custom', - factory: 'org.apache.calcite.adapter.csv.CsvSchemaFactory', - operand: { - directory: 'target/test-classes/sales' - } - } - ] -} -{% endhighlight %} - -The model defines a single schema called 'SALES'. The schema is -powered by a plugin class, -<a href="{{ site.sourceRoot }}/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory.java">org.apache.calcite.adapter.csv.CsvSchemaFactory</a>, -which is part of the -calcite-example-csv project and implements the Calcite interface -<a href="{{ site.apiRoot }}/org/apache/calcite/schema/SchemaFactory.html">SchemaFactory</a>. -Its <code>create</code> method instantiates a -schema, passing in the <code>directory</code> argument from the model file: - -{% highlight java %} -public Schema create(SchemaPlus parentSchema, String name, - Map<String, Object> operand) { - String directory = (String) operand.get("directory"); - String flavorName = (String) operand.get("flavor"); - CsvTable.Flavor flavor; - if (flavorName == null) { - flavor = CsvTable.Flavor.SCANNABLE; - } else { - flavor = CsvTable.Flavor.valueOf(flavorName.toUpperCase()); - } - return new CsvSchema( - new File(directory), - flavor); -} -{% endhighlight %} - -Driven by the model, the schema factory instantiates a single schema -called 'SALES'. The schema is an instance of -<a href="{{ site.sourceRoot }}/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchema.java">org.apache.calcite.adapter.csv.CsvSchema</a> -and implements the Calcite interface -<a href="{{ site.apiRoot }}/org/apache/calcite/schema/Schema.html">Schema</a>. - -A schema's job is to produce a list of tables. (It can also list sub-schemas and -table-functions, but these are advanced features and calcite-example-csv does -not support them.) The tables implement Calcite's -<a href="{{ site.apiRoot }}/org/apache/calcite/schema/Table.html">Table</a> -interface. <code>CsvSchema</code> produces tables that are instances of -<a href="{{ site.sourceRoot }}/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTable.java">CsvTable</a> -and its sub-classes. - -Here is the relevant code from <code>CsvSchema</code>, overriding the -<code><a href="{{ site.apiRoot }}/org/apache/calcite/schema/impl/AbstractSchema.html#getTableMap()">getTableMap()</a></code> -method in the <code>AbstractSchema</code> base class. - -{% highlight java %} -protected Map<String, Table> getTableMap() { - // Look for files in the directory ending in ".csv", ".csv.gz", ".json", - // ".json.gz". - File[] files = directoryFile.listFiles( - new FilenameFilter() { - public boolean accept(File dir, String name) { - final String nameSansGz = trim(name, ".gz"); - return nameSansGz.endsWith(".csv") - || nameSansGz.endsWith(".json"); - } - }); - if (files == null) { - System.out.println("directory " + directoryFile + " not found"); - files = new File[0]; - } - // Build a map from table name to table; each file becomes a table. - final ImmutableMap.Builder<String, Table> builder = ImmutableMap.builder(); - for (File file : files) { - String tableName = trim(file.getName(), ".gz"); - final String tableNameSansJson = trimOrNull(tableName, ".json"); - if (tableNameSansJson != null) { - JsonTable table = new JsonTable(file); - builder.put(tableNameSansJson, table); - continue; - } - tableName = trim(tableName, ".csv"); - final Table table = createTable(file); - builder.put(tableName, table); - } - return builder.build(); -} - -/** Creates different sub-type of table based on the "flavor" attribute. */ -private Table createTable(File file) { - switch (flavor) { - case TRANSLATABLE: - return new CsvTranslatableTable(file, null); - case SCANNABLE: - return new CsvScannableTable(file, null); - case FILTERABLE: - return new CsvFilterableTable(file, null); - default: - throw new AssertionError("Unknown flavor " + flavor); - } -} -{% endhighlight %} - -The schema scans the directory and finds all files whose name ends -with ".csv" and creates tables for them. In this case, the directory -is <code>target/test-classes/sales</code> and contains files -<code>EMPS.csv</code> and <code>DEPTS.csv</code>, which these become -the tables <code>EMPS</code> and <code>DEPTS</code>. - -## Tables and views in schemas - -Note how we did not need to define any tables in the model; the schema -generated the tables automatically. - -You can define extra tables, -beyond those that are created automatically, -using the <code>tables</code> property of a schema. - -Let's see how to create -an important and useful type of table, namely a view. - -A view looks like a table when you are writing a query, but it doesn't store data. -It derives its result by executing a query. -The view is expanded while the query is being planned, so the query planner -can often perform optimizations like removing expressions from the SELECT -clause that are not used in the final result. - -Here is a schema that defines a view: - -{% highlight json %} -{ - version: '1.0', - defaultSchema: 'SALES', - schemas: [ - { - name: 'SALES', - type: 'custom', - factory: 'org.apache.calcite.adapter.csv.CsvSchemaFactory', - operand: { - directory: 'target/test-classes/sales' - }, - tables: [ - { - name: 'FEMALE_EMPS', - type: 'view', - sql: 'SELECT * FROM emps WHERE gender = \'F\'' - } - ] - } - ] -} -{% endhighlight %} - -The line <code>type: 'view'</code> tags <code>FEMALE_EMPS</code> as a view, -as opposed to a regular table or a custom table. -Note that single-quotes within the view definition are escaped using a -back-slash, in the normal way for JSON. - -JSON doesn't make it easy to author long strings, so Calcite supports an -alternative syntax. If your view has a long SQL statement, you can instead -supply a list of lines rather than a single string: - -{% highlight json %} -{ - name: 'FEMALE_EMPS', - type: 'view', - sql: [ - 'SELECT * FROM emps', - 'WHERE gender = \'F\'' - ] -} -{% endhighlight %} - -Now we have defined a view, we can use it in queries just as if it were a table: - -{% highlight sql %} -sqlline> SELECT e.name, d.name FROM female_emps AS e JOIN depts AS d on e.deptno = d.deptno; -+--------+------------+ -| NAME | NAME | -+--------+------------+ -| Wilma | Marketing | -+--------+------------+ -{% endhighlight %} - -## Custom tables - -Custom tables are tables whose implementation is driven by user-defined code. -They don't need to live in a custom schema. - -There is an example in <code>model-with-custom-table.json</code>: - -{% highlight json %} -{ - version: '1.0', - defaultSchema: 'CUSTOM_TABLE', - schemas: [ - { - name: 'CUSTOM_TABLE', - tables: [ - { - name: 'EMPS', - type: 'custom', - factory: 'org.apache.calcite.adapter.csv.CsvTableFactory', - operand: { - file: 'target/test-classes/sales/EMPS.csv.gz', - flavor: "scannable" - } - } - ] - } - ] -} -{% endhighlight %} - -We can query the table in the usual way: - -{% highlight sql %} -sqlline> !connect jdbc:calcite:model=target/test-classes/model-with-custom-table.json admin admin -sqlline> SELECT empno, name FROM custom_table.emps; -+--------+--------+ -| EMPNO | NAME | -+--------+--------+ -| 100 | Fred | -| 110 | Eric | -| 110 | John | -| 120 | Wilma | -| 130 | Alice | -+--------+--------+ -{% endhighlight %} - -The schema is a regular one, and contains a custom table powered by -<a href="{{ site.sourceRoot }}/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableFactory.java">org.apache.calcite.adapter.csv.CsvTableFactory</a>, -which implements the Calcite interface -<a href="{{ site.apiRoot }}/org/apache/calcite/schema/TableFactory.html">TableFactory</a>. -Its <code>create</code> method instantiates a <code>CsvScannableTable</code>, -passing in the <code>file</code> argument from the model file: - -{% highlight java %} -public CsvTable create(SchemaPlus schema, String name, - Map<String, Object> map, RelDataType rowType) { - String fileName = (String) map.get("file"); - final File file = new File(fileName); - final RelProtoDataType protoRowType = - rowType != null ? RelDataTypeImpl.proto(rowType) : null; - return new CsvScannableTable(file, protoRowType); -} -{% endhighlight %} - -Implementing a custom table is often a simpler alternative to implementing -a custom schema. Both approaches might end up creating a similar implementation -of the <code>Table</code> interface, but for the custom table you don't -need to implement metadata discovery. (<code>CsvTableFactory</code> -creates a <code>CsvScannableTable</code>, just as <code>CsvSchema</code> does, -but the table implementation does not scan the filesystem for .csv files.) - -Custom tables require more work for the author of the model (the author -needs to specify each table and its file explicitly) but also give the author -more control (say, providing different parameters for each table). - -## Comments in models - -Models can include comments using `/* ... */` and `//` syntax: - -{% highlight json %} -{ - version: '1.0', - /* Multi-line - comment. */ - defaultSchema: 'CUSTOM_TABLE', - // Single-line comment. - schemas: [ - .. - ] -} -{% endhighlight %} - -(Comments are not standard JSON, but are a harmless extension.) - -## Optimizing queries using planner rules - -The table implementations we have seen so far are fine as long as the tables -don't contain a great deal of data. But if your customer table has, say, a -hundred columns and a million rows, you would rather that the system did not -retrieve all of the data for every query. You would like Calcite to negotiate -with the adapter and find a more efficient way of accessing the data. - -This negotiation is a simple form of query optimization. Calcite supports query -optimization by adding <i>planner rules</i>. Planner rules operate by -looking for patterns in the query parse tree (for instance a project on top -of a certain kind of table), and - -Planner rules are also extensible, like schemas and tables. So, if you have a -data store that you want to access via SQL, you first define a custom table or -schema, and then you define some rules to make the access efficient. - -To see this in action, let's use a planner rule to access -a subset of columns from a CSV file. Let's run the same query against two very -similar schemas: - -{% highlight sql %} -sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin -sqlline> explain plan for select name from emps; -+-----------------------------------------------------+ -| PLAN | -+-----------------------------------------------------+ -| EnumerableCalcRel(expr#0..9=[{inputs}], NAME=[$t1]) | -| EnumerableTableAccessRel(table=[[SALES, EMPS]]) | -+-----------------------------------------------------+ -sqlline> !connect jdbc:calcite:model=target/test-classes/smart.json admin admin -sqlline> explain plan for select name from emps; -+-----------------------------------------------------+ -| PLAN | -+-----------------------------------------------------+ -| EnumerableCalcRel(expr#0..9=[{inputs}], NAME=[$t1]) | -| CsvTableScan(table=[[SALES, EMPS]]) | -+-----------------------------------------------------+ -{% endhighlight %} - -What causes the difference in plan? Let's follow the trail of evidence. In the -<code>smart.json</code> model file, there is just one extra line: - -{% highlight json %} -flavor: "translatable" -{% endhighlight %} - -This causes a <code>CsvSchema</code> to be created with -<code>flavor = TRANSLATABLE</code>, -and its <code>createTable</code> method creates instances of -<a href="{{ site.sourceRoot }}/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable.java">CsvTranslatableTable</a> -rather than a <code>CsvScannableTable</code>. - -<code>CsvTranslatableTable</code> implements the -<code><a href="{{ site.apiRoot }}/org/apache/calcite/schema/TranslatableTable.html#toRel()">TranslatableTable.toRel()</a></code> -method to create -<a href="{{ site.sourceRoot }}/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableScan.java">CsvTableScan</a>. -Table scans are the leaves of a query operator tree. -The usual implementation is -<code><a href="{{ site.apiRoot }}/org/apache/calcite/adapter/enumerable/EnumerableTableScan.html">EnumerableTableScan</a></code>, -but we have created a distinctive sub-type that will cause rules to fire. - -Here is the rule in its entirety: - -{% highlight java %} -public class CsvProjectTableScanRule extends RelOptRule { - public static final CsvProjectTableScanRule INSTANCE = - new CsvProjectTableScanRule(); - - private CsvProjectTableScanRule() { - super( - operand(Project.class, - operand(CsvTableScan.class, none())), - "CsvProjectTableScanRule"); - } - - @Override - public void onMatch(RelOptRuleCall call) { - final Project project = call.rel(0); - final CsvTableScan scan = call.rel(1); - int[] fields = getProjectFields(project.getProjects()); - if (fields == null) { - // Project contains expressions more complex than just field references. - return; - } - call.transformTo( - new CsvTableScan( - scan.getCluster(), - scan.getTable(), - scan.csvTable, - fields)); - } - - private int[] getProjectFields(List<RexNode> exps) { - final int[] fields = new int[exps.size()]; - for (int i = 0; i < exps.size(); i++) { - final RexNode exp = exps.get(i); - if (exp instanceof RexInputRef) { - fields[i] = ((RexInputRef) exp).getIndex(); - } else { - return null; // not a simple projection - } - } - return fields; - } -} -{% endhighlight %} - -The constructor declares the pattern of relational expressions that will cause -the rule to fire. - -The <code>onMatch</code> method generates a new relational expression and calls -<code><a href="{{ site.apiRoot }}/org/apache/calcite/plan/RelOptRuleCall.html#transformTo(org.apache.calcite.rel.RelNode)">RelOptRuleCall.transformTo()</a></code> -to indicate that the rule has fired successfully. - -## The query optimization process - -There's a lot to say about how clever Calcite's query planner is, but we won't -say it here. The cleverness is designed to take the burden off you, the writer -of planner rules. - -First, Calcite doesn't fire rules in a prescribed order. The query optimization -process follows many branches of a branching tree, just like a chess playing -program examines many possible sequences of moves. If rules A and B both match a -given section of the query operator tree, then Calcite can fire both. - -Second, Calcite uses cost in choosing between plans, but the cost model doesn't -prevent rules from firing which may seem to be more expensive in the short term. - -Many optimizers have a linear optimization scheme. Faced with a choice between -rule A and rule B, as above, such an optimizer needs to choose immediately. It -might have a policy such as "apply rule A to the whole tree, then apply rule B -to the whole tree", or apply a cost-based policy, applying the rule that -produces the cheaper result. - -Calcite doesn't require such compromises. -This makes it simple to combine various sets of rules. -If, say you want to combine rules to recognize materialized views with rules to -read from CSV and JDBC source systems, you just give Calcite the set of all -rules and tell it to go at it. - -Calcite does use a cost model. The cost model decides which plan to ultimately -use, and sometimes to prune the search tree to prevent the search space from -exploding, but it never forces you to choose between rule A and rule B. This is -important, because it avoids falling into local minima in the search space that -are not actually optimal. - -Also (you guessed it) the cost model is pluggable, as are the table and query -operator statistics it is based upon. But that can be a subject for later. - -## JDBC adapter - -The JDBC adapter maps a schema in a JDBC data source as a Calcite schema. - -For example, this schema reads from a MySQL "foodmart" database: - -{% highlight json %} -{ - version: '1.0', - defaultSchema: 'FOODMART', - schemas: [ - { - name: 'FOODMART', - type: 'custom', - factory: 'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory', - operand: { - jdbcDriver: 'com.mysql.jdbc.Driver', - jdbcUrl: 'jdbc:mysql://localhost/foodmart', - jdbcUser: 'foodmart', - jdbcPassword: 'foodmart' - } - } - ] -} -{% endhighlight %} - -(The FoodMart database will be familiar to those of you who have used -the Mondrian OLAP engine, because it is Mondrian's main test data -set. To load the data set, follow <a -href="http://mondrian.pentaho.com/documentation/installation.php#2_Set_up_test_data">Mondrian's -installation instructions</a>.) - -<b>Current limitations</b>: The JDBC adapter currently only pushes -down table scan operations; all other processing (filtering, joins, -aggregations and so forth) occurs within Calcite. Our goal is to push -down as much processing as possible to the source system, translating -syntax, data types and built-in functions as we go. If a Calcite query -is based on tables from a single JDBC database, in principle the whole -query should go to that database. If tables are from multiple JDBC -sources, or a mixture of JDBC and non-JDBC, Calcite will use the most -efficient distributed query approach that it can. - -## The cloning JDBC adapter - -The cloning JDBC adapter creates a hybrid database. The data is -sourced from a JDBC database but is read into in-memory tables the -first time each table is accessed. Calcite evaluates queries based on -those in-memory tables, effectively a cache of the database. - -For example, the following model reads tables from a MySQL -"foodmart" database: - -{% highlight json %} -{ - version: '1.0', - defaultSchema: 'FOODMART_CLONE', - schemas: [ - { - name: 'FOODMART_CLONE', - type: 'custom', - factory: 'org.apache.calcite.adapter.clone.CloneSchema$Factory', - operand: { - jdbcDriver: 'com.mysql.jdbc.Driver', - jdbcUrl: 'jdbc:mysql://localhost/foodmart', - jdbcUser: 'foodmart', - jdbcPassword: 'foodmart' - } - } - ] -} -{% endhighlight %} - -Another technique is to build a clone schema on top of an existing -schema. You use the <code>source</code> property to reference a schema -defined earlier in the model, like this: - -{% highlight json %} -{ - version: '1.0', - defaultSchema: 'FOODMART_CLONE', - schemas: [ - { - name: 'FOODMART', - type: 'custom', - factory: 'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory', - operand: { - jdbcDriver: 'com.mysql.jdbc.Driver', - jdbcUrl: 'jdbc:mysql://localhost/foodmart', - jdbcUser: 'foodmart', - jdbcPassword: 'foodmart' - } - }, - { - name: 'FOODMART_CLONE', - type: 'custom', - factory: 'org.apache.calcite.adapter.clone.CloneSchema$Factory', - operand: { - source: 'FOODMART' - } - } - ] -} -{% endhighlight %} - -You can use this approach to create a clone schema on any type of -schema, not just JDBC. - -The cloning adapter isn't the be-all and end-all. We plan to develop -more sophisticated caching strategies, and a more complete and -efficient implementation of in-memory tables, but for now the cloning -JDBC adapter shows what is possible and allows us to try out our -initial implementations. - -## Further topics - -### Defining a custom schema - -(To be written.) - -### Modifying data - -How to enable DML operations (INSERT, UPDATE and DELETE) on your schema. - -(To be written.) - -### Calling conventions - -(To be written.) - -### Statistics and cost - -(To be written.) - -### Defining and using user-defined functions - -(To be written.) - -### Defining tables in a schema - -(To be written.) - -### Defining custom tables - -(To be written.) - -### Built-in SQL implementation - -How does Calcite implement SQL, if an adapter does not implement all of the core -relational operators? - -(To be written.) - -### Table functions - -(To be written.) http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_includes/header.html ---------------------------------------------------------------------- diff --git a/avatica/site/_includes/header.html b/avatica/site/_includes/header.html index 7d42786..e901cc4 100644 --- a/avatica/site/_includes/header.html +++ b/avatica/site/_includes/header.html @@ -5,7 +5,7 @@ <div class="grid"> <div class="unit one-third center-on-mobiles"> <h1> - <a href="{{ site.baseurl }}/"> + <a href="{{ site.baseurl }}/.."> <span class="sr-only">Apache Calcite</span> <img src="{{ site.baseurl }}/img/logo.png" width="226" height="140" alt="Calcite Logo"> </a> http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_includes/news_contents.html ---------------------------------------------------------------------- diff --git a/avatica/site/_includes/news_contents.html b/avatica/site/_includes/news_contents.html index d1481e7..09176b8 100644 --- a/avatica/site/_includes/news_contents.html +++ b/avatica/site/_includes/news_contents.html @@ -5,7 +5,7 @@ <a href="{{ site.baseurl }}/news/">All News</a> </li> <li class="{% if page.title == 'Releases' %}current{% endif %}"> - <a href="{{ site.baseurl }}/news/releases/">Calcite Releases</a> + <a href="{{ site.baseurl }}/news/releases/">Avatica Releases</a> </li> </ul> <h4>Recent Releases</h4> http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2014-06-27-release-0.8.0-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2014-06-27-release-0.8.0-incubating.md b/avatica/site/_posts/2014-06-27-release-0.8.0-incubating.md deleted file mode 100644 index 83f975d..0000000 --- a/avatica/site/_posts/2014-06-27-release-0.8.0-incubating.md +++ /dev/null @@ -1,31 +0,0 @@ ---- -layout: news_item -date: "2014-06-27 00:00:00 -0800" -author: jhyde -version: 0.8 -tag: v0-8 -sha: 3da850a1 -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -Several new features, including a heuristic rule to plan queries with -a large number of joins, a number of windowed aggregate functions, and -new utility, `SqlRun`. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2014-08-19-release-0.9.0-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2014-08-19-release-0.9.0-incubating.md b/avatica/site/_posts/2014-08-19-release-0.9.0-incubating.md deleted file mode 100644 index 7ca65ac..0000000 --- a/avatica/site/_posts/2014-08-19-release-0.9.0-incubating.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -layout: news_item -date: "2014-08-19 00:00:00 -0800" -author: jhyde -version: 0.9.0-incubating -fullVersion: apache-optiq-0.9.0-incubating -tag: v0-9-0 -sha: 45e5269b -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -This is the first release under the Apache incubator process. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2014-10-02-release-0.9.1-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2014-10-02-release-0.9.1-incubating.md b/avatica/site/_posts/2014-10-02-release-0.9.1-incubating.md deleted file mode 100644 index bca9113..0000000 --- a/avatica/site/_posts/2014-10-02-release-0.9.1-incubating.md +++ /dev/null @@ -1,29 +0,0 @@ ---- -layout: news_item -date: "2014-10-02 00:00:00 -0800" -author: jhyde -version: 0.9.1-incubating -tag: v0-9-1 -sha: 68012573 -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -This is the first release as Calcite. (The project was previously called Optiq.) http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2014-11-05-release-0.9.2-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2014-11-05-release-0.9.2-incubating.md b/avatica/site/_posts/2014-11-05-release-0.9.2-incubating.md deleted file mode 100644 index 47741c0..0000000 --- a/avatica/site/_posts/2014-11-05-release-0.9.2-incubating.md +++ /dev/null @@ -1,32 +0,0 @@ ---- -layout: news_item -date: "2014-11-05 00:00:00 -0800" -author: jhyde -version: 0.9.2-incubating -tag: v0-9-2 -sha: 0404fd23 -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -A fairly minor release, and last release before we rename all of the -packages and lots of classes, in what we expect to call 1.0. If you -have an existing application, it's worth upgrading to this first, -before you move on to 1.0. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-01-31-release-1.0.0-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-01-31-release-1.0.0-incubating.md b/avatica/site/_posts/2015-01-31-release-1.0.0-incubating.md deleted file mode 100644 index ad60205..0000000 --- a/avatica/site/_posts/2015-01-31-release-1.0.0-incubating.md +++ /dev/null @@ -1,42 +0,0 @@ ---- -layout: news_item -date: "2015-01-31 19:03:07 -0800" -author: jhyde -version: 1.0.0-incubating -tag: v1-0-0 -sha: 2dd83f2 -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -Calcite's first major release. - -Since the previous release we have re-organized the into the `org.apache.calcite` -namespace. To make migration of your code easier, we have described the -<a href="https://issues.apache.org/jira/secure/attachment/12681620/mapping.txt">mapping from old to new class names</a> -as an attachment to -[<a href="https://issues.apache.org/jira/browse/CALCITE-296">CALCITE-296</a>]. - -The release adds SQL support for `GROUPING SETS`, `EXTEND`, `UPSERT` and sequences; -a remote JDBC driver; -improvements to the planner engine and built-in planner rules; -improvements to the algorithms that implement the relational algebra, -including an interpreter that can evaluate queries without compilation; -and fixes about 30 bugs. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-03-13-release-1.1.0-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-03-13-release-1.1.0-incubating.md b/avatica/site/_posts/2015-03-13-release-1.1.0-incubating.md deleted file mode 100644 index 327b023..0000000 --- a/avatica/site/_posts/2015-03-13-release-1.1.0-incubating.md +++ /dev/null @@ -1,41 +0,0 @@ ---- -layout: news_item -date: "2015-03-13 19:03:07 -0800" -author: jhyde -version: 1.1.0-incubating -tag: v1-1-0 -sha: f10ea367 -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -This Calcite release makes it possible to exploit physical properties -of relational expressions to produce more efficient plans, introducing -collation and distribution as traits, `Exchange` relational operator, -and several new forms of metadata. - -We add experimental support for streaming SQL. - -This release drops support for JDK 1.6; Calcite now requires 1.7 or -later. - -We have introduced static `create` methods for many sub-classes of -`RelNode`. We strongly suggest that you use these rather than -calling constructors directly. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-04-07-release-1.2.0-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-04-07-release-1.2.0-incubating.md b/avatica/site/_posts/2015-04-07-release-1.2.0-incubating.md deleted file mode 100644 index 8332063..0000000 --- a/avatica/site/_posts/2015-04-07-release-1.2.0-incubating.md +++ /dev/null @@ -1,41 +0,0 @@ ---- -layout: news_item -date: "2015-04-07 19:03:07 -0800" -author: jhyde -version: 1.2.0-incubating -tag: v1-2-0 -sha: d60f2aa -categories: [release] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -A short release, less than a month after 1.1. - -There have been many changes to Avatica, hugely improving its coverage of the -JDBC API and overall robustness. A new provider, `JdbcMeta`, allows -you to remote an existing JDBC driver. - -[<a href="https://issues.apache.org/jira/browse/CALCITE-606">CALCITE-606</a>] -improves how the planner propagates traits such as collation and -distribution among relational expressions. - -[<a href="https://issues.apache.org/jira/browse/CALCITE-613">CALCITE-613</a>] -and [<a href="https://issues.apache.org/jira/browse/CALCITE-307">CALCITE-307</a>] -improve implicit and explicit conversions in SQL. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-04-24-new-committers.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-04-24-new-committers.md b/avatica/site/_posts/2015-04-24-new-committers.md deleted file mode 100644 index 92a2466..0000000 --- a/avatica/site/_posts/2015-04-24-new-committers.md +++ /dev/null @@ -1,34 +0,0 @@ ---- -layout: news_item -title: "Calcite adds 5 committers" -date: "2015-04-24 19:03:07 -0800" -author: jhyde -categories: [team] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -The Calcite project management committee today added five new -committers for their work on Calcite. Welcome all! - -* Aman Sinha -* Jesús Camacho-RodrÃguez -* Jinfeng Ni -* John Pullokkaran -* Nick Dimiduk http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-05-30-release-1.3.0-incubating.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-05-30-release-1.3.0-incubating.md b/avatica/site/_posts/2015-05-30-release-1.3.0-incubating.md deleted file mode 100644 index b72765f..0000000 --- a/avatica/site/_posts/2015-05-30-release-1.3.0-incubating.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -layout: news_item -date: "2015-05-30 23:05:37 +0000" -author: jhyde -version: 1.3.0-incubating -categories: [release] -tag: v1-3-0 -sha: 495f1859 ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -Mainly bug-fixes, but this release adds support for -<a href="https://issues.apache.org/jira/browse/CALCITE-505">modifiable views</a> -and -<a href="https://issues.apache.org/jira/browse/CALCITE-704">filtered aggregate functions</a> -and various improvements to Avatica. http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-06-05-algebra-builder.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-06-05-algebra-builder.md b/avatica/site/_posts/2015-06-05-algebra-builder.md deleted file mode 100644 index ddf3377..0000000 --- a/avatica/site/_posts/2015-06-05-algebra-builder.md +++ /dev/null @@ -1,87 +0,0 @@ ---- -layout: news_item -title: "Algebra builder" -date: "2015-06-05 19:29:07 -0800" -author: jhyde -categories: ["new features"] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -Calcite's foundation is a comprehensive implementation of relational -algebra (together with transformation rules, cost model, and metadata) -but to create algebra expressions you had to master a complex API. - -We're solving this problem by introducing an -[algebra builder]({{ site.apiRoot }}/org/apache/calcite/tools/RelBuilder.html), -a single class with all the methods you need to build any relational -expression. - -For example, - -{% highlight java %} -final FrameworkConfig config; -final RelBuilder builder = RelBuilder.create(config); -final RelNode node = builder - .scan("EMP") - .aggregate(builder.groupKey("DEPTNO"), - builder.count(false, "C"), - builder.sum(false, "S", builder.field("SAL"))) - .filter( - builder.call(SqlStdOperatorTable.GREATER_THAN, - builder.field("C"), - builder.literal(10))) - .build(); -System.out.println(RelOptUtil.toString(node)); -{% endhighlight %} - -creates the algebra - -{% highlight text %} -LogicalFilter(condition=[>($1, 10)]) - LogicalAggregate(group=[{7}], C=[COUNT()], S=[SUM($5)]) - LogicalTableScan(table=[[scott, EMP]]) -{% endhighlight %} - -which is equivalent to the SQL - -{% highlight sql %} -SELECT deptno, count(*) AS c, sum(sal) AS s -FROM emp -GROUP BY deptno -HAVING count(*) > 10 -{% endhighlight %} - -The [algebra builder documentation]({{ site.baseurl }}/docs/algebra.html) describes the -full API and has lots of examples. - -We're still working on the algebra builder, but plan to release it -with Calcite 1.4 (see -[[CALCITE-748](https://issues.apache.org/jira/browse/CALCITE-748)]). - -The algebra builder will make some existing tasks easier (such as -writing planner rules), but will also enable new things, such as -writing applications directly on top of Calcite, or implementing -non-SQL query languages. These applications and languages will be able -to take advantage of Calcite's existing back-ends (including -Hive-on-Tez, Drill, MongoDB, Splunk, Spark, JDBC data sources) and -extensive set of query-optimization rules. - -If you have questions or comments, please post to the -[mailing list]({{ site.baseurl }}/develop). http://git-wip-us.apache.org/repos/asf/calcite/blob/65f2afa7/avatica/site/_posts/2015-07-31-xldb-best-lightning-talk.md ---------------------------------------------------------------------- diff --git a/avatica/site/_posts/2015-07-31-xldb-best-lightning-talk.md b/avatica/site/_posts/2015-07-31-xldb-best-lightning-talk.md deleted file mode 100644 index b0e1f46..0000000 --- a/avatica/site/_posts/2015-07-31-xldb-best-lightning-talk.md +++ /dev/null @@ -1,41 +0,0 @@ ---- -layout: news_item -title: "XLDB 2015 best lightning talk" -date: "2015-07-31 11:53:00 -0800" -author: jhyde -categories: ["talks"] ---- -<!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> - -Julian Hyde's talk Apache Calcite: One planner fits all won -[Best Lightning Talk](http://www.xldb.org/archives/2015/05/best-lightning-talks-selected/) -at the XLDB-2015 conference (with Eric Tschetter's talk "Sketchy -Approximations"). - -XLDB is an annual conference that brings together experts from -science, industry and academia to find practical solutions to problems -involving extremely large data sets. - -As a result of winning Best Lightning Talk, Julian will get a 30 -minute keynote speaking slot at XLDB-2016. - -The talk is available in -[slides](http://www.slideshare.net/julianhyde/apache-calcite-one-planner-fits-all) -and [video](https://www.youtube.com/watch?v=5_MyORYjq3w).
