http://git-wip-us.apache.org/repos/asf/calcite/blob/5cee486f/avatica/site/_docs/howto.md ---------------------------------------------------------------------- diff --git a/avatica/site/_docs/howto.md b/avatica/site/_docs/howto.md new file mode 100644 index 0000000..13f2e72 --- /dev/null +++ b/avatica/site/_docs/howto.md @@ -0,0 +1,775 @@ +--- +layout: docs +title: HOWTO +permalink: /docs/howto.html +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +Here's some miscellaneous documentation about using Calcite and its various +adapters. + +* TOC +{:toc} + +## Building from a source distribution + +Prerequisites are maven (3.2.1 or later) +and Java (JDK 1.7 or later, 1.8 preferred) on your path. + +Unpack the source distribution `.tar.gz` or `.zip` file, +`cd` to the root directory of the unpacked source, +then build using maven: + +{% highlight bash %} +$ tar xvfz calcite-1.6.0-source.tar.gz +$ cd calcite-1.6.0 +$ mvn install +{% endhighlight %} + +[Running tests](#running-tests) describes how to run more or fewer +tests. + +## Building from git + +Prerequisites are git, maven (3.2.1 or later) +and Java (JDK 1.7 or later, 1.8 preferred) on your path. + +Create a local copy of the github repository, +`cd` to its root directory, +then build using maven: + +{% highlight bash %} +$ git clone git://github.com/apache/calcite.git +$ cd calcite +$ mvn install +{% endhighlight %} + +Calcite includes a number of machine-generated codes. By default, these are +regenerated on every build, but this has the negative side-effect of causing +a re-compilation of the entire project when the non-machine-generated code +has not changed. To make sure incremental compilation still works as intended, +provide the `skipGenerate` command line option with your maven command. +If you invoke the `clean` lifecycle phase, you must not specify the +`skipGenerate` option as it will not recompile the necessary code for the build +to succeed. + +{% highlight bash %} +$ mvn clean +$ mvn package +... hacks ... +$ mvn package -DskipGenerate +{% endhighlight %} + +[Running tests](#running-tests) describes how to run more or fewer +tests. + +## Running tests + +The test suite will run by default when you build, unless you specify +`-DskipTests`: + +{% highlight bash %} +$ mvn -DskipTests clean install +$ mvn test +{% endhighlight %} + +There are other options that control which tests are run, and in what +environment, as follows. + +* `-Dcalcite.test.db=DB` (where db is `h2`, `hsqldb`, `mysql`, or `postgresql`) allows you + to change the JDBC data source for the test suite. Calcite's test + suite requires a JDBC data source populated with the foodmart data + set. + * `hsqldb`, the default, uses an in-memory hsqldb database. + * All others access a test virtual machine + (see [integration tests](#running-integration-tests) below). + `mysql` and `postgresql` might be somewhat faster than hsqldb, but you need + to populate it (i.e. provision a VM). +* `-Dcalcite.debug` prints extra debugging information to stdout. +* `-Dcalcite.test.slow` enables tests that take longer to execute. For + example, there are tests that create virtual TPC-H and TPC-DS schemas + in-memory and run tests from those benchmarks. +* `-Dcalcite.test.splunk` enables tests that run against Splunk. + Splunk must be installed and running. + +## Running integration tests + +For testing Calcite's external adapters, a test virtual machine should be used. +The VM includes H2, HSQLDB, MySQL, MongoDB, and PostgreSQL. + +Test VM requires 5GiB of disk space and it takes 30 minutes to build. + +Note: you can use [calcite-test-dataset](https://github.com/vlsi/calcite-test-dataset) + to populate your own database, however it is recommended to use test VM so the test environment can be reproduced. + +### VM preparation + +0) Install dependencies: [Vagrant](https://www.vagrantup.com/) and [VirtualBox](https://www.virtualbox.org/) + +1) Clone https://github.com/vlsi/calcite-test-dataset.git at the same level as calcite repository. +For instance: + +{% highlight bash %} +code + +-- calcite + +-- calcite-test-dataset +{% endhighlight %} + +Note: integration tests search for ../calcite-test-dataset or ../../calcite-test-dataset. + You can specify full path via calcite.test.dataset system property. + +2) Build and start the VM: + +{% highlight bash %} +cd calcite-test-dataset && mvn install +{% endhighlight %} + +### VM management + +Test VM is provisioned by Vagrant, so regular Vagrant `vagrant up` and `vagrant halt` should be used to start and stop the VM. +The connection strings for different databases are listed in [calcite-test-dataset](https://github.com/vlsi/calcite-test-dataset) readme. + +### Suggested test flow + +Note: test VM should be started before you launch integration tests. Calcite itself does not start/stop the VM. + +Command line: + +* Executing regular unit tests (does not require external data): no change. `mvn test` or `mvn install`. +* Executing all tests, for all the DBs: `mvn verify -Pit`. `it` stands for "integration-test". `mvn install -Pit` works as well. +* Executing just tests for external DBs, excluding unit tests: `mvn -Dtest=foo -DfailIfNoTests=false -Pit verify` +* Executing just MongoDB tests: `cd mongo; mvn verify -Pit` + +From within IDE: + +* Executing regular unit tests: no change. +* Executing MongoDB tests: run `MongoAdapterIT.java` as usual (no additional properties are required) +* Executing MySQL tests: run `JdbcTest` and `JdbcAdapterTest` with setting `-Dcalcite.test.db=mysql` +* Executing PostgreSQL tests: run `JdbcTest` and `JdbcAdapterTest` with setting `-Dcalcite.test.db=postgresql` + +### Integration tests technical details + +Tests with external data are executed at maven's integration-test phase. +We do not currently use pre-integration-test/post-integration-test, however we could use that in future. +The verification of build pass/failure is performed at verify phase. +Integration tests should be named `...IT.java`, so they are not picked up on unit test execution. + +## Contributing + +See the [developers guide]({{ site.baseurl }}/develop/#contributing). + +## Getting started + +See the [developers guide]({{ site.baseurl }}/develop/#getting-started). + +## Tracing + +To enable tracing, add the following flags to the java command line: + +`-Dcalcite.debug=true` + +The first flag causes Calcite to print the Java code it generates +(to execute queries) to stdout. It is especially useful if you are debugging +mysterious problems like this: + +`Exception in thread "main" java.lang.ClassCastException: Integer cannot be cast to Long + at Baz$1$1.current(Unknown Source)` + +By default, Calcite uses the Log4j bindings for SLF4J. There is a provided configuration +file which outputs logging at the INFO level to the console in `core/src/test/resources/log4j.properties`. +You can modify the level for the rootLogger to increase verbosity or change the level +for a specific class if you so choose. + +{% highlight properties %} +# Change rootLogger level to WARN +log4j.rootLogger=WARN, A1 +# Increase level to DEBUG for RelOptPlanner +log4j.logger.org.apache.calcite.plan.RelOptPlanner=DEBUG +# Increase level to TRACE for HepPlanner +log4j.logger.org.apache.calcite.plan.hep.HepPlanner=TRACE +{% endhighlight %} + +## CSV adapter + +See the [tutorial](/docs/tutorial.html). + +## MongoDB adapter + +First, download and install Calcite, +and <a href="http://www.mongodb.org/downloads">install MongoDB</a>. + +Note: you can use MongoDB from integration test virtual machine above. + +Import MongoDB's zipcode data set into MongoDB: + +{% highlight bash %} +$ curl -o /tmp/zips.json http://media.mongodb.org/zips.json +$ mongoimport --db test --collection zips --file /tmp/zips.json +Tue Jun 4 16:24:14.190 check 9 29470 +Tue Jun 4 16:24:14.469 imported 29470 objects +{% endhighlight %} + +Log into MongoDB to check it's there: + +{% highlight bash %} +$ mongo +MongoDB shell version: 2.4.3 +connecting to: test +> db.zips.find().limit(3) +{ "city" : "ACMAR", "loc" : [ -86.51557, 33.584132 ], "pop" : 6055, "state" : "AL", "_id" : "35004" } +{ "city" : "ADAMSVILLE", "loc" : [ -86.959727, 33.588437 ], "pop" : 10616, "state" : "AL", "_id" : "35005" } +{ "city" : "ADGER", "loc" : [ -87.167455, 33.434277 ], "pop" : 3205, "state" : "AL", "_id" : "35006" } +> exit +bye +{% endhighlight %} + +Connect using the +[mongo-zips-model.json]({{ site.sourceRoot }}/mongodb/src/test/resources/mongo-zips-model.json) +Calcite model: + +{% highlight bash %} +$ ./sqlline +sqlline> !connect jdbc:calcite:model=mongodb/target/test-classes/mongo-zips-model.json admin admin +Connecting to jdbc:calcite:model=mongodb/target/test-classes/mongo-zips-model.json +Connected to: Calcite (version 1.x.x) +Driver: Calcite JDBC Driver (version 1.x.x) +Autocommit status: true +Transaction isolation: TRANSACTION_REPEATABLE_READ +sqlline> !tables ++------------+--------------+-----------------+---------------+ +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | ++------------+--------------+-----------------+---------------+ +| null | mongo_raw | zips | TABLE | +| null | mongo_raw | system.indexes | TABLE | +| null | mongo | ZIPS | VIEW | +| null | metadata | COLUMNS | SYSTEM_TABLE | +| null | metadata | TABLES | SYSTEM_TABLE | ++------------+--------------+-----------------+---------------+ +sqlline> select count(*) from zips; ++---------+ +| EXPR$0 | ++---------+ +| 29467 | ++---------+ +1 row selected (0.746 seconds) +sqlline> !quit +Closing: org.apache.calcite.jdbc.FactoryJdbc41$CalciteConnectionJdbc41 +$ +{% endhighlight %} + +## Splunk adapter + +To run the test suite and sample queries against Splunk, +load Splunk's `tutorialdata.zip` data set as described in +<a href="http://docs.splunk.com/Documentation/Splunk/6.0.2/PivotTutorial/GetthetutorialdataintoSplunk">the Splunk tutorial</a>. + +(This step is optional, but it provides some interesting data for the sample +queries. It is also necessary if you intend to run the test suite, using +`-Dcalcite.test.splunk=true`.) + +## Implementing an adapter + +New adapters can be created by implementing `CalcitePrepare.Context`: + +{% highlight java %} +import org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.calcite.jdbc.CalciteSchema; + +public class AdapterContext implements CalcitePrepare.Context { + @Override + public JavaTypeFactory getTypeFactory() { + // adapter implementation + return typeFactory; + } + + @Override + public CalciteSchema getRootSchema() { + // adapter implementation + return rootSchema; + } +} +{% endhighlight %} + +### Testing adapter in Java + +The example below shows how SQL query can be submitted to +`CalcitePrepare` with a custom context (`AdapterContext` in this +case). Calcite prepares and implements the query execution, using the +resources provided by the `Context`. `CalcitePrepare.PrepareResult` +provides access to the underlying enumerable and methods for +enumeration. The enumerable itself can naturally be some adapter +specific implementation. + +{% highlight java %} +import org.apache.calcite.jdbc.CalcitePrepare; +import org.apache.calcite.prepare.CalcitePrepareImpl; +import org.junit.Test; + +public class AdapterContextTest { + @Test + public void testSelectAllFromTable() { + AdapterContext ctx = new AdapterContext(); + String sql = "SELECT * FROM TABLENAME"; + Class elementType = Object[].class; + CalcitePrepare.PrepareResult<Object> prepared = + new CalcitePrepareImpl().prepareSql(ctx, sql, null, elementType, -1); + Object enumerable = prepared.getExecutable(); + // etc. + } +} +{% endhighlight %} + +# Advanced topics for developers + +The following sections might be of interest if you are adding features +to particular parts of the code base. You don't need to understand +these topics if you are just building from source and running tests. + +## JavaTypeFactory + +When Calcite compares types (instances of `RelDataType`), it requires them to be the same +object. If there are two distinct type instances that refer to the +same Java type, Calcite may fail to recognize that they match. It is +recommended to: + +* Use a single instance of `JavaTypeFactory` within the calcite context; +* Store the types so that the same object is always returned for the same type. + +## Rebuilding generated Protocol Buffer code + +Calcite's Avatica Server component supports RPC serialization +using [Protocol Buffers](https://developers.google.com/protocol-buffers/). +In the context of Avatica, Protocol Buffers can +generate a collection of messages defined by a schema. The library +itself can parse old serialized messages using a +new schema. This is highly desirable in an environment where the +client and server are not guaranteed to have the same version of +objects. + +Typically, the code generated by the Protocol Buffers library doesn't +need to be re-generated only every build, only when the schema changes. + +First, install Protobuf 3.0: + +{% highlight bash %} +$ wget https://github.com/google/protobuf/releases/download/v3.0.0-beta-1/protobuf-java-3.0.0-beta-1.tar.gz +$ tar xf protobuf-java-3.0.0-beta-1.tar.gz && cd protobuf-3.0.0-beta-1 +$ ./configure +$ make +$ sudo make install +{% endhighlight %} + +Then, re-generate the compiled code: + +{% highlight bash %} +$ cd avatica +$ ./src/main/scripts/generate-protobuf.sh +{% endhighlight %} + +# Advanced topics for committers + +The following sections are of interest to Calcite committers and in +particular release managers. + +## Set up PGP signing keys (for Calcite committers) + +Follow instructions [here](http://www.apache.org/dev/release-signing) to +create a key pair. (On Mac OS X, I did `brew install gpg` and +`gpg --gen-key`.) + +Add your public key to the `KEYS` file by following instructions in +the `KEYS` file. + +## Making a snapshot (for Calcite committers) + +Before you start: + +* Set up signing keys as described above. +* Make sure you are using JDK 1.7 (not 1.8). +* Make sure build and tests succeed with `-Dcalcite.test.db=hsqldb` (the default) + +{% highlight bash %} +# Set passphrase variable without putting it into shell history +read -s GPG_PASSPHRASE + +# Make sure that there are no junk files in the sandbox +git clean -xn +mvn clean + +mvn -Papache-release -Dgpg.passphrase=${GPG_PASSPHRASE} install +{% endhighlight %} + +When the dry-run has succeeded, change `install` to `deploy`. + +## Making a release (for Calcite committers) + +Before you start: + +* Set up signing keys as described above. +* Make sure you are using JDK 1.7 (not 1.8). +* Check that `README` and `site/_docs/howto.md` have the correct version number. +* Set `version.major` and `version.minor` in `pom.xml`. +* Make sure build and tests succeed, including with + -Dcalcite.test.db={mysql,hsqldb}, -Dcalcite.test.slow, + -Dcalcite.test.mongodb, -Dcalcite.test.splunk. +* Trigger a + <a href="https://scan.coverity.com/projects/2966">Coverity scan</a> + by merging the latest code into the `julianhyde/coverity_scan` branch, + and when it completes, make sure that there are no important issues. +* Make sure that + <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20status%20%3D%20Resolved%20and%20fixVersion%20is%20null"> + every "resolved" JIRA case</a> (including duplicates) has + a fix version assigned (most likely the version we are + just about to release) + +Create a release branch named after the release, e.g. `branch-1.1`, and push it to Apache. + +{% highlight bash %} +$ git checkout -b branch-X.Y +$ git push -u origin branch-X.Y +{% endhighlight %} + +We will use the branch for the entire the release process. Meanwhile, +we do not allow commits to the master branch. After the release is +final, we can use `git merge --ff-only` to append the changes on the +release branch onto the master branch. (Apache does not allow reverts +to the master branch, which makes it difficult to clean up the kind of +messy commits that inevitably happen while you are trying to finalize +a release.) + +Now, set up your environment and do a dry run. The dry run will not +commit any changes back to git and gives you the opportunity to verify +that the release process will complete as expected. + +If any of the steps fail, clean up (see below), fix the problem, and +start again from the top. + +{% highlight bash %} +# Set passphrase variable without putting it into shell history +read -s GPG_PASSPHRASE + +# Make sure that there are no junk files in the sandbox +git clean -xn +mvn clean + +# Do a dry run of the release:prepare step, which sets version numbers. +mvn -DdryRun=true -DskipTests -DreleaseVersion=X.Y.Z -DdevelopmentVersion=X.Y.Z+1-SNAPSHOT -Papache-release -Darguments="-Dgpg.passphrase=${GPG_PASSPHRASE}" release:prepare 2>&1 | tee /tmp/prepare-dry.log +{% endhighlight %} + +Check the artifacts: + +* In the `target` directory should be these 8 files, among others: + * apache-calcite-X.Y.Z-src.tar.gz + * apache-calcite-X.Y.Z-src.tar.gz.asc + * apache-calcite-X.Y.Z-src.tar.gz.md5 + * apache-calcite-X.Y.Z-src.tar.gz.sha1 + * apache-calcite-X.Y.Z-src.zip + * apache-calcite-X.Y.Z-src.zip.asc + * apache-calcite-X.Y.Z-src.zip.md5 + * apache-calcite-X.Y.Z-src.zip.sha1 +* Note that the file names start `apache-calcite-`. +* In the two source distros `.tar.gz` and `.zip` (currently there is + no binary distro), check that all files belong to a directory called + `apache-calcite-X.Y.Z-src`. +* That directory must contain files `NOTICE`, `LICENSE`, + `README`, `README.md` + * Check that the version in `README` is correct +* In each .jar (for example + `core/target/calcite-core-X.Y.Z.jar` and + `mongodb/target/calcite-mongodb-X.Y.Z-sources.jar`), check + that the `META-INF` directory contains `DEPENDENCIES`, `LICENSE`, + `NOTICE` and `git.properties` +* In each .jar, check that `org-apache-calcite-jdbc.properties` is + present and does not contain un-substituted `${...}` variables +* Check PGP, per [this](https://httpd.apache.org/dev/verification.html) + +Now, remove the `-DdryRun` flag and run the release for real. + +{% highlight bash %} +# Prepare sets the version numbers, creates a tag, and pushes it to git. +mvn -DdryRun=false -DskipTests -DreleaseVersion=X.Y.Z -DdevelopmentVersion=X.Y.Z+1-SNAPSHOT -Papache-release -Darguments="-Dgpg.passphrase=${GPG_PASSPHRASE}" release:prepare 2>&1 | tee /tmp/prepare.log + +# Perform checks out the tagged version, builds, and deploys to the staging repository +mvn -DskipTests -Papache-release -Darguments="-Dgpg.passphrase=${GPG_PASSPHRASE}" release:perform 2>&1 | tee /tmp/perform.log +{% endhighlight %} + +Verify the staged artifacts in the Nexus repository: + +* Go to [https://repository.apache.org/](https://repository.apache.org/) and login +* Under `Build Promotion`, click `Staging Repositories` +* In the `Staging Repositories` tab there should be a line with profile `org.apache.calcite` +* Navigate through the artifact tree and make sure the .jar, .pom, .asc files are present +* Check the box on in the first column of the row, + and press the 'Close' button to publish the repository at + https://repository.apache.org/content/repositories/orgapachecalcite-1000 + (or a similar URL) + +Upload the artifacts via subversion to a staging area, +https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-X.Y.Z-rcN: + +{% highlight bash %} +# Create a subversion workspace, if you haven't already +mkdir -p ~/dist/dev +pushd ~/dist/dev +svn co https://dist.apache.org/repos/dist/dev/calcite +popd + +# Move the files into a directory +cd target +mkdir ~/dist/dev/calcite/apache-calcite-X.Y.Z-rcN +mv apache-calcite-* ~/dist/dev/calcite/apache-calcite-X.Y.Z-rcN + +# Check in +cd ~/dist/dev/calcite +svn add apache-calcite-X.Y.Z-rcN +svn ci +{% endhighlight %} + +## Cleaning up after a failed release attempt (for Calcite committers) + +{% highlight bash %} +# Make sure that the tag you are about to generate does not already +# exist (due to a failed release attempt) +git tag + +# If the tag exists, delete it locally and remotely +git tag -d apache-calcite-X.Y.Z +git push origin :refs/tags/apache-calcite-X.Y.Z + +# Remove modified files +mvn release:clean + +# Check whether there are modified files and if so, go back to the +# original git commit +git status +git reset --hard HEAD +{% endhighlight %} + +## Validate a release + +{% highlight bash %} +# Check that the signing key (e.g. 2AD3FAE3) is pushed +gpg --recv-keys key + +# Check keys +curl -O https://dist.apache.org/repos/dist/release/calcite/KEYS + +# Sign/check md5 and sha1 hashes +# (Assumes your O/S has 'md5' and 'sha1' commands.) +function checkHash() { + cd "$1" + for i in *.{zip,pom,gz}; do + if [ ! -f $i ]; then + continue + fi + if [ -f $i.md5 ]; then + if [ "$(cat $i.md5)" = "$(md5 -q $i)" ]; then + echo $i.md5 present and correct + else + echo $i.md5 does not match + fi + else + md5 -q $i > $i.md5 + echo $i.md5 created + fi + if [ -f $i.sha1 ]; then + if [ "$(cat $i.sha1)" = "$(sha1 -q $i)" ]; then + echo $i.sha1 present and correct + else + echo $i.sha1 does not match + fi + else + sha1 -q $i > $i.sha1 + echo $i.sha1 created + fi + done +} +checkHash apache-calcite-X.Y.Z-rcN +{% endhighlight %} + +## Get approval for a release via Apache voting process (for Calcite committers) + +Release vote on dev list + +{% highlight text %} +To: [email protected] +Subject: [VOTE] Release apache-calcite-X.Y.Z (release candidate N) + +Hi all, + +I have created a build for Apache Calcite X.Y.Z, release candidate N. + +Thanks to everyone who has contributed to this release. +<Further details about release.> You can read the release notes here: +https://github.com/apache/calcite/blob/XXXX/site/_docs/history.md + +The commit to be voted upon: +http://git-wip-us.apache.org/repos/asf/calcite/commit/NNNNNN + +Its hash is XXXX. + +The artifacts to be voted on are located here: +https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-X.Y.Z-rcN/ + +The hashes of the artifacts are as follows: +src.tar.gz.md5 XXXX +src.tar.gz.sha1 XXXX +src.zip.md5 XXXX +src.zip.sha1 XXXX + +A staged Maven repository is available for review at: +https://repository.apache.org/content/repositories/orgapachecalcite-NNNN + +Release artifacts are signed with the following key: +https://people.apache.org/keys/committer/jhyde.asc + +Please vote on releasing this package as Apache Calcite X.Y.Z. + +The vote is open for the next 72 hours and passes if a majority of +at least three +1 PMC votes are cast. + +[ ] +1 Release this package as Apache Calcite X.Y.Z +[ ] 0 I don't feel strongly about it, but I'm okay with the release +[ ] -1 Do not release this package because... + + +Here is my vote: + ++1 (binding) + +Julian +{% endhighlight %} + +After vote finishes, send out the result: + +{% highlight text %} +Subject: [RESULT] [VOTE] Release apache-calcite-X.Y.Z (release candidate N) +To: [email protected] + +Thanks to everyone who has tested the release candidate and given +their comments and votes. + +The tally is as follows. + +N binding +1s: +<names> + +N non-binding +1s: +<names> + +No 0s or -1s. + +Therefore I am delighted to announce that the proposal to release +Apache Calcite X.Y.Z has passed. + +Thanks everyone. Weâll now roll the release out to the mirrors. + +There was some feedback during voting. I shall open a separate +thread to discuss. + + +Julian +{% endhighlight %} + +Use the [Apache URL shortener](http://s.apache.org) to generate +shortened URLs for the vote proposal and result emails. Examples: +[s.apache.org/calcite-1.2-vote](http://s.apache.org/calcite-1.2-vote) and +[s.apache.org/calcite-1.2-result](http://s.apache.org/calcite-1.2-result). + + +## Publishing a release (for Calcite committers) + +After a successful release vote, we need to push the release +out to mirrors, and other tasks. + +Choose a release date. +This is based on the time when you expect to announce the release. +This is usually a day after the vote closes. +Remember that UTC date changes at 4pm Pacific time. + +In JIRA, search for +[all issues resolved in this release](https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20and%20fixVersion%20%3D%201.5.0%20and%20status%20%3D%20Resolved%20and%20resolution%20%3D%20Fixed), +and do a bulk update changing their status to "Closed", +with a change comment +"Resolved in release X.Y.Z (YYYY-MM-DD)" +(fill in release number and date appropriately). +Uncheck "Send mail for this update". + +Promote the staged nexus artifacts. + +* Go to [https://repository.apache.org/](https://repository.apache.org/) and login +* Under "Build Promotion" click "Staging Repositories" +* In the line with "orgapachecalcite-xxxx", check the box +* Press "Release" button + +Check the artifacts into svn. + +{% highlight bash %} +# Get the release candidate. +mkdir -p ~/dist/dev +cd ~/dist/dev +svn co https://dist.apache.org/repos/dist/dev/calcite + +# Copy the artifacts. Note that the copy does not have '-rcN' suffix. +mkdir -p ~/dist/release +cd ~/dist/release +svn co https://dist.apache.org/repos/dist/release/calcite +cd calcite +cp -rp ../../dev/calcite/apache-calcite-X.Y.Z-rcN apache-calcite-X.Y.Z +svn add apache-calcite-X.Y.Z + +# Check in. +svn ci +{% endhighlight %} + +Svnpubsub will publish to the +[release repo](https://dist.apache.org/repos/dist/release/calcite) and propagate to the +[mirrors](http://www.apache.org/dyn/closer.cgi/calcite) within 24 hours. + +If there are now more than 2 releases, clear out the oldest ones: + +{% highlight bash %} +cd ~/dist/release/calcite +svn rm apache-calcite-X.Y.Z +svn ci +{% endhighlight %} + +The old releases will remain available in the +[release archive](http://archive.apache.org/dist/calcite/). + +Add a release note by copying +[site/_posts/2015-11-10-release-1.5.0.md]({{ site.sourceRoot }}/site/_posts/2015-11-10-release-1.5.0.md), +generate the javadoc and copy to `site/target/apidocs` and `site/target/testapidocs`, +[publish the site](#publish-the-web-site), +and check that it appears in the contents in [news](http://localhost:4000/news/). + +After 24 hours, announce the release by sending an email to +[[email protected]](https://mail-archives.apache.org/mod_mbox/www-announce/). +You can use +[the 1.6.0 announcement](https://mail-archives.apache.org/mod_mbox/www-announce/201601.mbox/%3C8DB4C1E5-B322-4A33-8E8F-9858FA6A1119%40apache.org%3E) +as a template. Be sure to include a brief description of the project. + +## Publishing the web site (for Calcite committers) +{: #publish-the-web-site} + +See instructions in +[site/README.md]({{ site.sourceRoot }}/site/README.md).
http://git-wip-us.apache.org/repos/asf/calcite/blob/5cee486f/avatica/site/_docs/index.md ---------------------------------------------------------------------- diff --git a/avatica/site/_docs/index.md b/avatica/site/_docs/index.md new file mode 100644 index 0000000..f0c4afd --- /dev/null +++ b/avatica/site/_docs/index.md @@ -0,0 +1,145 @@ +--- +layout: docs +title: Background +permalink: /docs/index.html +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +Apache Calcite is a dynamic data management framework. + +It contains many of the pieces that comprise a typical database +management system, but omits some key functions: storage of data, +algorithms to process data, and a repository for storing metadata. + +Calcite intentionally stays out of the business of storing and +processing data. As we shall see, this makes it an excellent choice +for mediating between applications and one or more data storage +locations and data processing engines. It is also a perfect foundation +for building a database: just add data. + +To illustrate, let's create an empty instance of Calcite and then +point it at some data. + +{% highlight java %} +public static class HrSchema { + public final Employee[] emps = 0; + public final Department[] depts = 0; +} +Class.forName("org.apache.calcite.jdbc.Driver"); +Properties info = new Properties(); +info.setProperty("lex", "JAVA"); +Connection connection = DriverManager.getConnection("jdbc:calcite:", info); +CalciteConnection calciteConnection = + connection.unwrap(CalciteConnection.class); +ReflectiveSchema.create(calciteConnection, + calciteConnection.getRootSchema(), "hr", new HrSchema()); +Statement statement = calciteConnection.createStatement(); +ResultSet resultSet = statement.executeQuery( + "select d.deptno, min(e.empid)\n" + + "from hr.emps as e\n" + + "join hr.depts as d\n" + + " on e.deptno = d.deptno\n" + + "group by d.deptno\n" + + "having count(*) > 1"); +print(resultSet); +resultSet.close(); +statement.close(); +connection.close(); +{% endhighlight %} + +Where is the database? There is no database. The connection is +completely empty until `ReflectiveSchema.create` registers a Java +object as a schema and its collection fields `emps` and `depts` as +tables. + +Calcite does not want to own data; it does not even have favorite data +format. This example used in-memory data sets, and processed them +using operators such as `groupBy` and `join` from the linq4j +library. But Calcite can also process data in other data formats, such +as JDBC. In the first example, replace + +{% highlight java %} +ReflectiveSchema.create(calciteConnection, + calciteConnection.getRootSchema(), "hr", new HrSchema()); +{% endhighlight %} + +with + +{% highlight java %} +Class.forName("com.mysql.jdbc.Driver"); +BasicDataSource dataSource = new BasicDataSource(); +dataSource.setUrl("jdbc:mysql://localhost"); +dataSource.setUsername("username"); +dataSource.setPassword("password"); +JdbcSchema.create(calciteConnection.getRootSchema(), "name", dataSource, + null, "hr"); +{% endhighlight %} + +and Calcite will execute the same query in JDBC. To the application, +the data and API are the same, but behind the scenes the +implementation is very different. Calcite uses optimizer rules to push +the `JOIN` and `GROUP BY` operations to the source database. + +In-memory and JDBC are just two familiar examples. Calcite can handle +any data source and data format. To add a data source, you need to +write an adapter that tells Calcite what collections in the data +source it should consider "tables". + +For more advanced integration, you can write optimizer +rules. Optimizer rules allow Calcite to access data of a new format, +allow you to register new operators (such as a better join algorithm), +and allow Calcite to optimize how queries are translated to +operators. Calcite will combine your rules and operators with built-in +rules and operators, apply cost-based optimization, and generate an +efficient plan. + +### Writing an adapter + +The subproject under example/csv provides a CSV adapter, which is +fully functional for use in applications but is also simple enough to +serve as a good template if you are writing your own adapter. + +See the <a href="{{ site.baseurl }}/docs/tutorial.html">tutorial</a> for information on using +the CSV adapter and writing other adapters. + +See the <a href="howto.html">HOWTO</a> for more information about +using other adapters, and about using Calcite in general. + +## Status + +The following features are complete. + +* Query parser, validator and optimizer +* Support for reading models in JSON format +* Many standard functions and aggregate functions +* JDBC queries against Linq4j and JDBC back-ends +* Linq4j front-end +* SQL features: SELECT, FROM (including JOIN syntax), WHERE, GROUP BY + (including GROUPING SETS), aggregate functions (including + COUNT(DISTINCT ...) and FILTER), HAVING, ORDER BY (including NULLS + FIRST/LAST), set operations (UNION, INTERSECT, MINUS), sub-queries + (including correlated sub-queries), windowed aggregates, LIMIT + (syntax as <a + href="http://www.postgresql.org/docs/8.4/static/sql-select.html#SQL-LIMIT">Postgres</a>); + more details in the [SQL reference](reference.html) +* Local and remote JDBC drivers; see [Avatica](avatica_overview.html) +* Several [adapters](adapter.html) + + http://git-wip-us.apache.org/repos/asf/calcite/blob/5cee486f/avatica/site/_docs/lattice.md ---------------------------------------------------------------------- diff --git a/avatica/site/_docs/lattice.md b/avatica/site/_docs/lattice.md new file mode 100644 index 0000000..918261c --- /dev/null +++ b/avatica/site/_docs/lattice.md @@ -0,0 +1,136 @@ +--- +layout: docs +title: Lattices +permalink: /docs/lattice.html +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +A lattice is a framework for creating and populating materialized views, +and for recognizing that a materialized view can be used to solve a +particular query. + +A lattice represents a star (or snowflake) schema, not a general +schema. In particular, all relationships must be many-to-one, heading +from a fact table at the center of the star. + +The name derives from the mathematics: a +<a href="http://en.wikipedia.org/wiki/Lattice_(order)">lattice</a> +is a +<a href="http://en.wikipedia.org/wiki/Partially_ordered_set">partially +ordered set</a> where any two elements have a unique greatest lower +bound and least upper bound. + +[<a href="#ref-hru96">HRU96</a>] observed that the set of possible +materializations of a data cube forms a lattice, and presented an +algorithm to choose a good set of materializations. Calcite's +recommendation algorithm is derived from this. + +The lattice definition uses a SQL statement to represent the star. SQL +is a useful short-hand to represent several tables joined together, +and assigning aliases to the column names (it more convenient than +inventing a new language to represent relationships, join conditions +and cardinalities). + +Unlike regular SQL, order is important. If you put A before B in the +FROM clause, and make a join between A and B, you are saying that +there is a many-to-one foreign key relationship from A to B. (E.g. in +the example lattice, the Sales fact table occurs before the Time +dimension table, and before the Product dimension table. The Product +dimension table occurs before the ProductClass outer dimension table, +further down an arm of a snowflake.) + +A lattice implies constraints. In the A to B relationship, there is a +foreign key on A (i.e. every value of A's foreign key has a +corresponding value in B's key), and a unique key on B (i.e. no key +value occurs more than once). These constraints are really important, +because it allows the planner to remove joins to tables whose columns +are not being used, and know that the query results will not change. + +Calcite does not check these constraints. If they are violated, +Calcite will return wrong results. + +A lattice is a big, virtual join view. It is not materialized (it +would be several times larger than the star schema, because of +denormalization) and you probably wouldn't want to query it (far too +many columns). So what is it useful for? As we said above, (a) the +lattice declares some very useful primary and foreign key constraints, +(b) it helps the query planner map user queries onto +filter-join-aggregate materialized views (the most useful kind of +materialized view for DW queries), (c) gives Calcite a framework +within which to gather stats about data volumes and user queries, (d) +allows Calcite to automatically design and populate materialized +views. + +Most star schema models force you to choose whether a column is a +dimension or a measure. In a lattice, every column is a dimension +column. (That is, it can become one of the columns in the GROUP BY clause +to query the star schema at a particular dimensionality). Any column +can also be used in a measure; you define measures by giving the +column and an aggregate function. + +If "unit_sales" tends to be used much more often as a measure rather +than a dimension, that's fine. Calcite's algorithm should notice that +it is rarely aggregated, and not be inclined to create tiles that +aggregate on it. (By "should" I mean "could and one day will". The +algorithm does not currently take query history into account when +designing tiles.) + +But someone might want to know whether orders with fewer than 5 items +were more or less profitable than orders with more than 100. All of a +sudden, "unit_sales" has become a dimension. If there's virtually zero +cost to declaring a column a dimension column, I figured let's make +them all dimension columns. + +The model allows for a particular table to be used more than once, +with a different table alias. You could use this to model say +OrderDate and ShipDate, with two uses to the Time dimension table. + +Most SQL systems require that the column names in a view are unique. +This is hard to achieve in a lattice, because you often include +primary and foreign key columns in a join. So Calcite lets you refer +to columns in two ways. If the column is unique, you can use its name, +["unit_sales"]. Whether or not it is unique in the lattice, it will be +unique in its table, so you can use it qualified by its table alias. +Examples: + +* ["sales", "unit_sales"] +* ["ship_date", "time_id"] +* ["order_date", "time_id"] + +A "tile" is a materialized table in a lattice, with a particular +dimensionality. (What Kylin calls a "cuboid".) The "tiles" attribute +of the <a href="{{ site.baseurl }}/docs/model.html#lattice">lattice JSON element</a> +defines an initial set of tiles to materialize. + +If you run the algorithm, you can omit the tiles attribute. Calcite +will choose an initial set. If you include the tiles attribute, the +algorithm will start with that list and then start finding other tiles +that are complementary (i.e. "fill in the gaps" left by the initial +tiles). + +### References + +<ul> +<li>[<a name="ref-hru96">HRU96</a>] V. Harinarayan, A. Rajaraman and J. Ullman. + <a href="http://web.eecs.umich.edu/~jag/eecs584/papers/implementing_data_cube.pdf">Implementing + data cubes efficiently</a>. + In <i>Proc. ACM SIGMOD Conf.</i>, Montreal, 1996.</li> +</ul> + http://git-wip-us.apache.org/repos/asf/calcite/blob/5cee486f/avatica/site/_docs/model.md ---------------------------------------------------------------------- diff --git a/avatica/site/_docs/model.md b/avatica/site/_docs/model.md new file mode 100644 index 0000000..2675453 --- /dev/null +++ b/avatica/site/_docs/model.md @@ -0,0 +1,517 @@ +--- +layout: docs +title: JSON models +permalink: /docs/model.html +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +Calcite models can be represented as JSON files. +This page describes the structure of those files. + +Models can also be built programmatically using the `Schema` SPI. + +## Elements + +### Root + +{% highlight json %} +{ + version: '1.0', + defaultSchema: 'mongo', + schemas: [ Schema... ] +} +{% endhighlight %} + +`version` (required string) must have value `1.0`. + +`defaultSchema` (optional string). If specified, it is +the name (case-sensitive) of a schema defined in this model, and will +become the default schema for connections to Calcite that use this model. + +`schemas` (optional list of <a href="#schema">Schema</a> elements). + +### Schema + +Occurs within `root.schemas`. + +{% highlight json %} +{ + name: 'foodmart', + path: ['lib'], + cache: true, + materializations: [ Materialization... ] +} +{% endhighlight %} + +`name` (required string) is the name of the schema. + +`type` (optional string, default `map`) indicates sub-type. Values are: + +* `map` for <a href="#map-schema">Map Schema</a> +* `custom` for <a href="#custom-schema">Custom Schema</a> +* `jdbc` for <a href="#jdbc-schema">JDBC Schema</a> + +`path` (optional list) is the SQL path that is used to +resolve functions used in this schema. If specified it must be a list, +and each element of the list must be either a string or a list of +strings. For example, + +{% highlight json %} + path: [ ['usr', 'lib'], 'lib' ] +{% endhighlight %} + +declares a path with two elements: the schema '/usr/lib' and the +schema '/lib'. Most schemas are at the top level, and for these you can use a +string. + +`materializations` (optional list of +<a href="#materialization">Materialization</a>) defines the tables +in this schema that are materializations of queries. + +`cache` (optional boolean, default true) tells Calcite whether to +cache metadata (tables, functions and sub-schemas) generated +by this schema. + +* If `false`, Calcite will go back to the schema each time it needs + metadata, for example, each time it needs a list of tables in order to + validate a query against the schema. + +* If `true`, Calcite will cache the metadata the first time it reads + it. This can lead to better performance, especially if name-matching is + case-insensitive. + +However, it also leads to the problem of cache staleness. +A particular schema implementation can override the +`Schema.contentsHaveChangedSince` method to tell Calcite +when it should consider its cache to be out of date. + +Tables, functions and sub-schemas explicitly created in a schema are +not affected by this caching mechanism. They always appear in the schema +immediately, and are never flushed. + +### Map Schema + +Like base class <a href="#schema">Schema</a>, occurs within `root.schemas`. + +{% highlight json %} +{ + name: 'foodmart', + type: 'map', + tables: [ Table... ], + functions: [ Function... ] +} +{% endhighlight %} + +`name`, `type`, `path`, `cache`, `materializations` inherited from +<a href="#schema">Schema</a>. + +`tables` (optional list of <a href="#table">Table</a> elements) +defines the tables in this schema. + +`functions` (optional list of <a href="#function">Function</a> elements) +defines the functions in this schema. + +### Custom Schema + +Like base class <a href="#schema">Schema</a>, occurs within `root.schemas`. + +{% highlight json %} +{ + name: 'mongo', + type: 'custom', + factory: 'org.apache.calcite.adapter.mongodb.MongoSchemaFactory', + operand: { + host: 'localhost', + database: 'test' + } +} +{% endhighlight %} + +`name`, `type`, `path`, `cache`, `materializations` inherited from +<a href="#schema">Schema</a>. + +`factory` (required string) is the name of the factory class for this +schema. Must implement interface +[org.apache.calcite.schema.SchemaFactory]({{ site.apiRoot }}/org/apache/calcite/schema/SchemaFactory.html) +and have a public default constructor. + +`operand` (optional map) contains attributes to be passed to the +factory. + +### JDBC Schema + +Like base class <a href="#schema">Schema</a>, occurs within `root.schemas`. + +{% highlight json %} +{ + name: 'foodmart', + type: 'jdbc', + jdbcDriver: TODO, + jdbcUrl: TODO, + jdbcUser: TODO, + jdbcPassword: TODO, + jdbcCatalog: TODO, + jdbcSchema: TODO +} +{% endhighlight %} + +`name`, `type`, `path`, `cache`, `materializations` inherited from +<a href="#schema">Schema</a>. + +`jdbcDriver` (optional string) is the name of the JDBC driver class. If not +specified, uses whichever class the JDBC DriverManager chooses. + +`jdbcUrl` (optional string) is the JDBC connect string, for example +"jdbc:mysql://localhost/foodmart". + +`jdbcUser` (optional string) is the JDBC user name. + +`jdbcPassword` (optional string) is the JDBC password. + +`jdbcCatalog` (optional string) is the name of the initial catalog in the JDBC +data source. + +`jdbcSchema` (optional string) is the name of the initial schema in the JDBC +data source. + +### Materialization + +Occurs within `root.schemas.materializations`. + +{% highlight json %} +{ + view: 'V', + table: 'T', + sql: 'select deptno, count(*) as c, sum(sal) as s from emp group by deptno' +} +{% endhighlight %} + +`view` (optional string) is the name of the view; null means that the table +already exists and is populated with the correct data. + +`table` (required string) is the name of the table that materializes the data in +the query. If `view` is not null, the table might not exist, and if it does not, +Calcite will create and populate an in-memory table. + +`sql` (optional string, or list of strings that will be concatenated as a + multi-line string) is the SQL definition of the materialization. + +### Table + +Occurs within `root.schemas.tables`. + +{% highlight json %} +{ + name: 'sales_fact', + columns: [ Column... ] +} +{% endhighlight %} + +`name` (required string) is the name of this table. Must be unique within the schema. + +`type` (optional string, default `custom`) indicates sub-type. Values are: + +* `custom` for <a href="#custom-table">Custom Table</a> +* `view` for <a href="#view">View</a> + +`columns` (list of <a href="#column">Column</a> elements, required for +some kinds of table, optional for others such as View) + +### View + +Like base class <a href="#table">Table</a>, occurs within `root.schemas.tables`. + +{% highlight json %} +{ + name: 'female_emps', + type: 'view', + sql: "select * from emps where gender = 'F'", + modifiable: true +} +{% endhighlight %} + +`name`, `type`, `columns` inherited from <a href="#table">Table</a>. + +`sql` (required string, or list of strings that will be concatenated as a + multi-line string) is the SQL definition of the view. + +`path` (optional list) is the SQL path to resolve the query. If not +specified, defaults to the current schema. + +`modifiable` (optional boolean) is whether the view is modifiable. +If null or not specified, Calcite deduces whether the view is modifiable. + +A view is modifiable if contains only SELECT, FROM, WHERE (no JOIN, aggregation +or sub-queries) and every column: + +* is specified once in the SELECT clause; or +* occurs in the WHERE clause with a `column = literal` predicate; or +* is nullable. + +The second clause allows Calcite to automatically provide the correct value for +hidden columns. It is useful in multi-tenant environments, where the `tenantId` +column is hidden, mandatory (NOT NULL), and has a constant value for a +particular view. + +Errors regarding modifiable views: + +* If a view is marked `modifiable: true` and is not modifiable, Calcite throws + an error while reading the schema. +* If you submit an INSERT, UPDATE or UPSERT command to a non-modifiable view, + Calcite throws an error when validating the statement. +* If a DML statement creates a row that would not appear in the view + (for example, a row in `female_emps`, above, with `gender = 'M'`), + Calcite throws an error when executing the statement. + +### Custom Table + +Like base class <a href="#table">Table</a>, occurs within `root.schemas.tables`. + +{% highlight json %} +{ + name: 'female_emps', + type: 'custom', + factory: 'TODO', + operand: { + todo: 'TODO' + } +} +{% endhighlight %} + +`name`, `type`, `columns` inherited from <a href="#table">Table</a>. + +`factory` (required string) is the name of the factory class for this +table. Must implement interface +[org.apache.calcite.schema.TableFactory]({{ site.apiRoot }}/org/apache/calcite/schema/TableFactory.html) +and have a public default constructor. + +`operand` (optional map) contains attributes to be passed to the +factory. + +### Stream + +Information about whether a table allows streaming. + +Occurs within `root.schemas.tables.stream`. + +{% highlight json %} +{ + stream: true, + history: false +} +{% endhighlight %} + +`stream` (optional; default true) is whether the table allows streaming. + +`history` (optional; default false) is whether the history of the stream is +available. + +### Column + +Occurs within `root.schemas.tables.columns`. + +{% highlight json %} +{ + name: 'empno' +} +{% endhighlight %} + +`name` (required string) is the name of this column. + +### Function + +Occurs within `root.schemas.functions`. + +{% highlight json %} +{ + name: 'MY_PLUS', + className: 'com.example.functions.MyPlusFunction', + methodName: 'apply', + path: [] +} +{% endhighlight %} + +`name` (required string) is the name of this function. + +`className` (required string) is the name of the class that implements this +function. + +`methodName` (optional string) is the name of the method that implements this +function. + +If `methodName` is specified, the method must exist (case-sensitive) and Calcite +will create a scalar function. The method may be static or non-static, but +if non-static, the class must have a public constructor with no parameters. + +If `methodName` is "*", Calcite creates a function for every method +in the class. + +If `methodName` is not specified, Calcite looks for a method called "eval", and +if found, creates a a table macro or scalar function. +It also looks for methods "init", "add", "merge", "result", and +if found, creates an aggregate function. + +`path` (optional list of string) is the path for resolving this function. + +### Lattice + +Occurs within `root.schemas.lattices`. + +{% highlight json %} +{ + name: 'star', + sql: [ + 'select 1 from "foodmart"."sales_fact_1997" as "s"', + 'join "foodmart"."product" as "p" using ("product_id")', + 'join "foodmart"."time_by_day" as "t" using ("time_id")', + 'join "foodmart"."product_class" as "pc" on "p"."product_class_id" = "pc"."product_class_id"' + ], + auto: false, + algorithm: true, + algorithmMaxMillis: 10000, + rowCountEstimate: 86837, + defaultMeasures: [ { + agg: 'count' + } ], + tiles: [ { + dimensions: [ 'the_year', ['t', 'quarter'] ], + measures: [ { + agg: 'sum', + args: 'unit_sales' + }, { + agg: 'sum', + args: 'store_sales' + }, { + agg: 'count' + } ] + } ] +} +{% endhighlight %} + +`name` (required string) is the name of this lattice. + +`sql` (required string, or list of strings that will be concatenated as a +multi-line string) is the SQL statement that defines the fact table, dimension +tables, and join paths for this lattice. + +`auto` (optional boolean, default true) is whether to materialize tiles on need +as queries are executed. + +`algorithm` (optional boolean, default false) is whether to use an optimization +algorithm to suggest and populate an initial set of tiles. + +`algorithmMaxMillis` (optional long, default -1, meaning no limit) is the +maximum number of milliseconds for which to run the algorithm. After this point, +takes the best result the algorithm has come up with so far. + +`rowCountEstimate` (optional double, default 1000.0) estimated number of rows in +the lattice + +`tiles` (optional list of <a href="#tile">Tile</a> elements) is a list of +materialized aggregates to create up front. + +`defaultMeasures` (optional list of <a href="#measure">Measure</a> elements) +is a list of measures that a tile should have by default. +Any tile defined in `tiles` can still define its own measures, including +measures not on this list. If not specified, the default list of measures is +just 'count(*)': + +{% highlight json %} +[ { name: 'count' } ] +{% endhighlight %} + +`statisticProvider` (optional name of a class that implements +[org.apache.calcite.materialize.LatticeStatisticProvider]({{ site.apiRoot }}/org/apache/calcite/materialize/LatticeStatisticProvider.html)) +provides estimates of the number of distinct values in each column. + +You can use a class name, or a class plus a static field. Example: + +{% highlight json %} + "statisticProvider": "org.apache.calcite.materialize.Lattices#CACHING_SQL_STATISTIC_PROVIDER" +{% endhighlight %} + +If not set, Calcite will generate and execute a SQL query to find the real +value, and cache the results. + +See also: <a href="{{ site.baseurl }}/docs/lattice.html">Lattices</a>. + +### Tile + +Occurs within `root.schemas.lattices.tiles`. + +{% highlight json %} +{ + dimensions: [ 'the_year', ['t', 'quarter'] ], + measures: [ { + agg: 'sum', + args: 'unit_sales' + }, { + agg: 'sum', + args: 'store_sales' + }, { + agg: 'count' + } ] +} +{% endhighlight %} + +`dimensions` (list of strings or string lists, required, but may be empty) +defines the dimensionality of this tile. +Each dimension is a column from the lattice, like a `GROUP BY` clause. +Each element can be either a string +(the unique label of the column within the lattice) +or a string list (a pair consisting of a table alias and a column name). + +`measures` (optional list of <a href="#measure">Measure</a> elements) is a list +of aggregate functions applied to arguments. If not specified, uses the +lattice's default measure list. + +### Measure + +Occurs within `root.schemas.lattices.defaultMeasures` +and `root.schemas.lattices.tiles.measures`. + +{% highlight json %} +{ + agg: 'sum', + args: [ 'unit_sales' ] +} +{% endhighlight %} + +`agg` is the name of an aggregate function (usually 'count', 'sum', 'min', +'max'). + +`args` (optional) is a column label (string), or list of zero or more column +labels + +Valid values are: + +* Not specified: no arguments +* null: no arguments +* Empty list: no arguments +* String: single argument, the name of a lattice column +* List: multiple arguments, each a column label + +Unlike lattice dimensions, measures can not be specified in qualified +format, {@code ["table", "column"]}. When you define a lattice, make sure +that each column you intend to use as a measure has a unique label within +the lattice (using "{@code AS label}" if necessary), and use that label +when you want to pass the column as a measure argument. + +<!-- End model.md -->
