This is an automated email from the ASF dual-hosted git repository.
fjy pushed a commit to branch 0.15.0-incubating
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
The following commit(s) were added to refs/heads/0.15.0-incubating by this push:
new f30fef3 Added the web console to the quickstart tutorials and docs
(#7863) (#7913)
f30fef3 is described below
commit f30fef3b40b0845e6d7fa311b009c36de3e44b00
Author: Jihoon Son <[email protected]>
AuthorDate: Mon Jun 17 18:48:21 2019 -0700
Added the web console to the quickstart tutorials and docs (#7863) (#7913)
* added console to the quickstart tutorials
* feedback fixes
* feedback fixes
* more typo fixes
* moved reseting cluster section after load data
* update images
* stage -> step
* feedback fixes
* more feedback fixes
---
docs/content/tutorials/img/tutorial-batch-01.png | Bin 54435 -> 0 bytes
.../img/tutorial-batch-data-loader-01.png | Bin 0 -> 99355 bytes
.../img/tutorial-batch-data-loader-02.png | Bin 0 -> 521148 bytes
.../img/tutorial-batch-data-loader-03.png | Bin 0 -> 217008 bytes
.../img/tutorial-batch-data-loader-04.png | Bin 0 -> 261225 bytes
.../img/tutorial-batch-data-loader-05.png | Bin 0 -> 256368 bytes
.../img/tutorial-batch-data-loader-06.png | Bin 0 -> 105983 bytes
.../img/tutorial-batch-data-loader-07.png | Bin 0 -> 81399 bytes
.../img/tutorial-batch-data-loader-08.png | Bin 0 -> 162397 bytes
.../img/tutorial-batch-data-loader-09.png | Bin 0 -> 107662 bytes
.../img/tutorial-batch-data-loader-10.png | Bin 0 -> 79080 bytes
.../img/tutorial-batch-data-loader-11.png | Bin 0 -> 133329 bytes
.../img/tutorial-batch-submit-task-01.png | Bin 0 -> 113916 bytes
.../img/tutorial-batch-submit-task-02.png | Bin 0 -> 136268 bytes
docs/content/tutorials/img/tutorial-kafka-01.png | Bin 0 -> 136317 bytes
docs/content/tutorials/img/tutorial-kafka-02.png | Bin 0 -> 125452 bytes
docs/content/tutorials/img/tutorial-query-01.png | Bin 0 -> 153120 bytes
docs/content/tutorials/img/tutorial-query-02.png | Bin 0 -> 129962 bytes
docs/content/tutorials/img/tutorial-query-03.png | Bin 0 -> 106082 bytes
docs/content/tutorials/img/tutorial-query-04.png | Bin 0 -> 108331 bytes
docs/content/tutorials/img/tutorial-query-05.png | Bin 0 -> 87070 bytes
docs/content/tutorials/img/tutorial-query-06.png | Bin 0 -> 130612 bytes
docs/content/tutorials/img/tutorial-query-07.png | Bin 0 -> 125457 bytes
.../tutorials/img/tutorial-quickstart-01.png | Bin 0 -> 56955 bytes
docs/content/tutorials/index.md | 54 ++--
docs/content/tutorials/tutorial-batch.md | 137 +++++++--
docs/content/tutorials/tutorial-kafka.md | 86 +++++-
docs/content/tutorials/tutorial-query.md | 329 +++++++++++----------
28 files changed, 389 insertions(+), 217 deletions(-)
diff --git a/docs/content/tutorials/img/tutorial-batch-01.png
b/docs/content/tutorials/img/tutorial-batch-01.png
deleted file mode 100644
index dc506dd..0000000
Binary files a/docs/content/tutorials/img/tutorial-batch-01.png and /dev/null
differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-01.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-01.png
new file mode 100644
index 0000000..b0b5da8
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-01.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-02.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-02.png
new file mode 100644
index 0000000..806ce4c
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-02.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-03.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-03.png
new file mode 100644
index 0000000..c6bb701
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-03.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-04.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-04.png
new file mode 100644
index 0000000..83a018b
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-04.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-05.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-05.png
new file mode 100644
index 0000000..71291c0
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-05.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-06.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-06.png
new file mode 100644
index 0000000..5fe9c37
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-06.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-07.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-07.png
new file mode 100644
index 0000000..16b48af
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-07.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-08.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-08.png
new file mode 100644
index 0000000..edaf039
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-08.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-09.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-09.png
new file mode 100644
index 0000000..6191fc2
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-09.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-10.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-10.png
new file mode 100644
index 0000000..4037792
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-10.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-data-loader-11.png
b/docs/content/tutorials/img/tutorial-batch-data-loader-11.png
new file mode 100644
index 0000000..76464f9
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-data-loader-11.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-submit-task-01.png
b/docs/content/tutorials/img/tutorial-batch-submit-task-01.png
new file mode 100644
index 0000000..1651401
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-submit-task-01.png differ
diff --git a/docs/content/tutorials/img/tutorial-batch-submit-task-02.png
b/docs/content/tutorials/img/tutorial-batch-submit-task-02.png
new file mode 100644
index 0000000..834a9a5
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-batch-submit-task-02.png differ
diff --git a/docs/content/tutorials/img/tutorial-kafka-01.png
b/docs/content/tutorials/img/tutorial-kafka-01.png
new file mode 100644
index 0000000..580d9af
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-kafka-01.png
differ
diff --git a/docs/content/tutorials/img/tutorial-kafka-02.png
b/docs/content/tutorials/img/tutorial-kafka-02.png
new file mode 100644
index 0000000..735ceaa
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-kafka-02.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-01.png
b/docs/content/tutorials/img/tutorial-query-01.png
new file mode 100644
index 0000000..7e483fc
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-01.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-02.png
b/docs/content/tutorials/img/tutorial-query-02.png
new file mode 100644
index 0000000..c25c651
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-02.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-03.png
b/docs/content/tutorials/img/tutorial-query-03.png
new file mode 100644
index 0000000..5b1e5bc
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-03.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-04.png
b/docs/content/tutorials/img/tutorial-query-04.png
new file mode 100644
index 0000000..df96420
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-04.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-05.png
b/docs/content/tutorials/img/tutorial-query-05.png
new file mode 100644
index 0000000..c241627
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-05.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-06.png
b/docs/content/tutorials/img/tutorial-query-06.png
new file mode 100644
index 0000000..1f3e5fb
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-06.png
differ
diff --git a/docs/content/tutorials/img/tutorial-query-07.png
b/docs/content/tutorials/img/tutorial-query-07.png
new file mode 100644
index 0000000..e23fc2a
Binary files /dev/null and b/docs/content/tutorials/img/tutorial-query-07.png
differ
diff --git a/docs/content/tutorials/img/tutorial-quickstart-01.png
b/docs/content/tutorials/img/tutorial-quickstart-01.png
new file mode 100644
index 0000000..94b2024
Binary files /dev/null and
b/docs/content/tutorials/img/tutorial-quickstart-01.png differ
diff --git a/docs/content/tutorials/index.md b/docs/content/tutorials/index.md
index dd05213..a1cc4ef 100644
--- a/docs/content/tutorials/index.md
+++ b/docs/content/tutorials/index.md
@@ -35,8 +35,9 @@ Before beginning the quickstart, it is helpful to read the
[general Druid overvi
### Software
You will need:
- * Java 8 (8u92+)
- * Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
+
+* Java 8 (8u92+)
+* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
### Hardware
@@ -116,21 +117,13 @@ All persistent state such as the cluster metadata store
and segments for the ser
Later on, if you'd like to stop the services, CTRL-C to exit the
`bin/start-micro-quickstart` script, which will terminate the Druid processes.
-### Resetting cluster state
-
-If you want a clean start after stopping the services, delete the `var`
directory and run the `bin/start-micro-quickstart` script again.
-
-Once every service has started, you are now ready to load data.
+Once the cluster has started, you can navigate to
[http://localhost:8888](http://localhost:8888).
+The [Druid router process](../development/router.html), which serves the Druid
console, resides at this address.
-#### Resetting Kafka
+
-If you completed [Tutorial: Loading stream data from
Kafka](./tutorial-kafka.html) and wish to reset the cluster state, you should
additionally clear out any Kafka state.
+It takes a few seconds for all the Druid processes to fully start up. If you
open the console immediately after starting the services, you may see some
errors that you can safely ignore.
-Shut down the Kafka broker with CTRL-C before stopping Zookeeper and the Druid
services, and then delete the Kafka log directory at `/tmp/kafka-logs`:
-
-```bash
-rm -rf /tmp/kafka-logs
-```
## Loading Data
@@ -138,7 +131,8 @@ rm -rf /tmp/kafka-logs
For the following data loading tutorials, we have included a sample data file
containing Wikipedia page edit events that occurred on 2015-09-12.
-This sample data is located at
`quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid
package root. The page edit events are stored as JSON objects in a text file.
+This sample data is located at
`quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid
package root.
+The page edit events are stored as JSON objects in a text file.
The sample data has the following columns, and an example event is shown below:
@@ -186,25 +180,31 @@ The sample data has the following columns, and an example
event is shown below:
}
```
-The following tutorials demonstrate various methods of loading data into
Druid, including both batch and streaming use
-cases. All tutorials assume that you are using the `micro-quickstart`
single-machine configuration mentioned above.
-### [Tutorial: Loading a file](./tutorial-batch.html)
+### Data loading tutorials
-This tutorial demonstrates how to perform a batch file load, using Druid's
native batch ingestion.
+The following tutorials demonstrate various methods of loading data into
Druid, including both batch and streaming use cases.
+All tutorials assume that you are using the `micro-quickstart` single-machine
configuration mentioned above.
-### [Tutorial: Loading stream data from Apache Kafka](./tutorial-kafka.html)
+- [Loading a file](./tutorial-batch.html) - this tutorial demonstrates how to
perform a batch file load, using Druid's native batch ingestion.
+- [Loading stream data from Apache Kafka](./tutorial-kafka.html) - this
tutorial demonstrates how to load streaming data from a Kafka topic.
+- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.html) - this
tutorial demonstrates how to perform a batch file load, using a remote Hadoop
cluster.
+- [Loading data using Tranquility](./tutorial-tranquility.html) - this
tutorial demonstrates how to load streaming data by pushing events to Druid
using the Tranquility service.
+- [Writing your own ingestion spec](./tutorial-ingestion-spec.html) - this
tutorial demonstrates how to write a new ingestion spec and use it to load data.
-This tutorial demonstrates how to load streaming data from a Kafka topic.
-### [Tutorial: Loading a file using Apache
Hadoop](./tutorial-batch-hadoop.html)
+### Resetting cluster state
-This tutorial demonstrates how to perform a batch file load, using a remote
Hadoop cluster.
+If you want a clean start after stopping the services, delete the `var`
directory and run the `bin/start-micro-quickstart` script again.
-### [Tutorial: Loading data using Tranquility](./tutorial-tranquility.html)
+Once every service has started, you are now ready to load data.
-This tutorial demonstrates how to load streaming data by pushing events to
Druid using the Tranquility service.
+#### Resetting Kafka
-### [Tutorial: Writing your own ingestion spec](./tutorial-ingestion-spec.html)
+If you completed [Tutorial: Loading stream data from
Kafka](./tutorial-kafka.html) and wish to reset the cluster state, you should
additionally clear out any Kafka state.
-This tutorial demonstrates how to write a new ingestion spec and use it to
load data.
+Shut down the Kafka broker with CTRL-C before stopping Zookeeper and the Druid
services, and then delete the Kafka log directory at `/tmp/kafka-logs`:
+
+```bash
+rm -rf /tmp/kafka-logs
+```
diff --git a/docs/content/tutorials/tutorial-batch.md
b/docs/content/tutorials/tutorial-batch.md
index aab7694..1d47123 100644
--- a/docs/content/tutorials/tutorial-batch.md
+++ b/docs/content/tutorials/tutorial-batch.md
@@ -24,18 +24,104 @@ title: "Tutorial: Loading a file"
# Tutorial: Loading a file
-## Getting started
-
This tutorial demonstrates how to perform a batch file load, using Apache
Druid (incubating)'s native batch ingestion.
For this tutorial, we'll assume you've already downloaded Druid as described
in
the [quickstart](index.html) using the `micro-quickstart` single-machine
configuration and have it
running on your local machine. You don't need to have loaded any data yet.
-## Preparing the data and the ingestion task spec
-
A data load is initiated by submitting an *ingestion task* spec to the Druid
Overlord. For this tutorial, we'll be loading the sample Wikipedia page edits
data.
+An ingestion spec can be written by hand or by using the "Data loader" that is
built into the Druid console.
+The data loader can help you build an ingestion spec by sampling your data and
and iteratively configuring various ingestion parameters.
+The data loader currently only supports native batch ingestion (support for
streaming, including data stored in Apache Kafka and AWS Kinesis, is coming in
future releases).
+Streaming ingestion is only available through a written ingestion spec today.
+
+We've included a sample of Wikipedia edits from September 12, 2015 to get you
started.
+
+
+## Loading data with the data loader
+
+Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in
the console header.
+Select `Local disk`.
+
+
+
+Enter the value of `quickstart/tutorial/` as the base directory and
`wikiticker-2015-09-12-sampled.json.gz` as a filter.
+The separation of base directory and [wildcard file
filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html)
is there if you need to ingest data from multiple files.
+
+Click `Preview` and make sure that the the data you are seeing is correct.
+
+
+
+Once the data is located, you can click "Next: Parse data" to go to the next
step.
+The data loader will try to automatically determine the correct parser for the
data.
+In this case it will successfully determine `json`.
+Feel free to play around with different parser options to get a preview of how
Druid will parse your data.
+
+
+
+With the `json` parser selected, click `Next: Parse time` to get to the step
centered around determining your primary timestamp column.
+Druid's architecture requires a primary timestamp column (internally stored in
a column called `__time`).
+If you do not have a timestamp in your data, select `Constant value`.
+In our example, the data loader will determine that the `time` column in our
raw data is the only candidate that can be used as the primary time column.
+
+
+
+Click `Next: ...` twice to go past the `Transform` and `Filter` steps.
+You do not need to enter anything in these steps as applying ingestion time
transforms and filters are out of scope for this tutorial.
+
+In the `Configure schema` step, you can configure which dimensions (and
metrics) will be ingested into Druid.
+This is exactly what the data will appear like in Druid once it is ingested.
+Since our dataset is very small, go ahead and turn off `Rollup` by clicking on
the switch and confirming the change.
+
+
+
+Once you are satisfied with the schema, click `Next` to go to the `Partition`
step where you can fine tune how the data will be partitioned into segments.
+Here you can adjust how the data will be split up into segments in Druid.
+Since this is a small dataset, there are no adjustments that need to be made
in this step.
+
+
+
+Clicking past the `Tune` step, we get to the publish step, which is where we
can specify what the datasource name in Druid.
+Let's name this datasource `wikipedia`.
+
+
+
+Finally, click `Next` to review your spec.
+This is the spec you have constructed.
+Feel free to go back and make changes in previous steps to see how changes
will update the spec.
+Similarly, you can also edit the spec directly and see it reflected in the
previous steps.
+
+
+
+Once you are satisfied with the spec, click `Submit` and an ingestion task
will be created.
+
+You will be taken to the task view with the focus on the newly created task.
+
+
+
+In the tasks view, you can click `Refresh` a couple of times until your
ingestion task (hopefully) succeeds.
+
+When a tasks succeeds it means that it built one or more segments that will
now be picked up by the data servers.
+
+Navigate to the `Datasources` view and click refresh until your datasource
(`wikipedia`) appears.
+This can take a few seconds as the segments are being loaded.
+
+
+
+A datasource is queryable once you see a green (fully available) circle.
+At this point, you can go to the `Query` view to run SQL queries against the
datasource.
+
+Since this is a small dataset, you can simply run a `SELECT * FROM wikipedia`
query to see your results.
+
+
+
+Check out the [query tutorial](../tutorials/tutorial-query.html) to run some
example queries on the newly loaded data.
+
+
+## Loading data with a spec (via console)
+
The Druid package includes the following sample native batch ingestion task
spec at `quickstart/tutorial/wikipedia-index.json`, shown here for convenience,
which has been configured to read the
`quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` input file:
@@ -105,14 +191,20 @@ which has been configured to read the
`quickstart/tutorial/wikiticker-2015-09-12
}
```
-This spec will create a datasource named "wikipedia",
+This spec will create a datasource named "wikipedia".
-## Load batch data
+From the task view, click on `Submit task` and select `Raw JSON task`.
-We've included a sample of Wikipedia edits from September 12, 2015 to get you
started.
+
+
+This will bring up the spec submission dialog where you can paste the spec
above.
+
+
-To load this data into Druid, you can submit an *ingestion task* pointing to
the file. We've included
-a task that loads the `wikiticker-2015-09-12-sampled.json.gz` file included in
the archive.
+Once the spec is submitted, you can follow the same instructions as above to
wait for the data to load and then query it.
+
+
+## Loading data with a spec (via command line)
For convenience, the Druid package includes a batch ingestion helper script at
`bin/post-index-task`.
@@ -138,15 +230,10 @@ Completed indexing data for wikipedia. Now loading
indexed data onto the cluster
wikipedia loading complete! You may now query your data
```
-## Querying your data
+Once the spec is submitted, you can follow the same instructions as above to
wait for the data to load and then query it.
-Once the data is loaded, please follow the [query
tutorial](../tutorials/tutorial-query.html) to run some example queries on the
newly loaded data.
-
-## Cleanup
-If you wish to go through any of the other ingestion tutorials, you will need
to shut down the cluster and reset the cluster state by removing the contents
of the `var` directory under the druid package, as the other tutorials will
write to the same "wikipedia" datasource.
-
-## Extra: Loading data without the script
+## Loading data without the script
Let's briefly discuss how we would've submitted the ingestion task without
using the script. You do not need to run these commands.
@@ -162,16 +249,18 @@ Which will print the ID of the task if the submission was
successful:
{"task":"index_wikipedia_2018-06-09T21:30:32.802Z"}
```
-To view the status of the ingestion task, go to the Druid Console:
-[http://localhost:8888/](http://localhost:8888). You can refresh the console
periodically, and after
-the task is successful, you should see a "SUCCESS" status for the task under
the [Tasks view](http://localhost:8888/unified-console.html#tasks).
+You can monitor the status of this task from the console as outlined above.
+
+
+## Querying your data
+
+Once the data is loaded, please follow the [query
tutorial](../tutorials/tutorial-query.html) to run some example queries on the
newly loaded data.
+
-After the ingestion task finishes, the data will be loaded by Historical
processes and available for
-querying within a minute or two. You can monitor the progress of loading the
data in the
-Datasources view, by checking whether there is a datasource "wikipedia" with a
green circle
-indicating "fully available":
[http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources).
+## Cleanup
+
+If you wish to go through any of the other ingestion tutorials, you will need
to shut down the cluster and reset the cluster state by removing the contents
of the `var` directory under the druid package, as the other tutorials will
write to the same "wikipedia" datasource.
-
## Further reading
diff --git a/docs/content/tutorials/tutorial-kafka.md
b/docs/content/tutorials/tutorial-kafka.md
index 3f6a9a1..0dc91cf 100644
--- a/docs/content/tutorials/tutorial-kafka.md
+++ b/docs/content/tutorials/tutorial-kafka.md
@@ -56,10 +56,87 @@ Run this command to create a Kafka topic called
*wikipedia*, to which we'll send
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor
1 --partitions 1 --topic wikipedia
```
-## Enable Druid Kafka ingestion
+## Start Druid Kafka ingestion
+
+We will use Druid's Kafka indexing service to ingest messages from our newly
created *wikipedia* topic.
+
+### Submit a supervisor via the console
+
+In the console, click `Submit supervisor` to open the submit supervisor dialog.
+
+
+
+Paste in this spec and click `Submit`.
+
+```json
+{
+ "type": "kafka",
+ "dataSchema": {
+ "dataSource": "wikipedia",
+ "parser": {
+ "type": "string",
+ "parseSpec": {
+ "format": "json",
+ "timestampSpec": {
+ "column": "time",
+ "format": "auto"
+ },
+ "dimensionsSpec": {
+ "dimensions": [
+ "channel",
+ "cityName",
+ "comment",
+ "countryIsoCode",
+ "countryName",
+ "isAnonymous",
+ "isMinor",
+ "isNew",
+ "isRobot",
+ "isUnpatrolled",
+ "metroCode",
+ "namespace",
+ "page",
+ "regionIsoCode",
+ "regionName",
+ "user",
+ { "name": "added", "type": "long" },
+ { "name": "deleted", "type": "long" },
+ { "name": "delta", "type": "long" }
+ ]
+ }
+ }
+ },
+ "metricsSpec" : [],
+ "granularitySpec": {
+ "type": "uniform",
+ "segmentGranularity": "DAY",
+ "queryGranularity": "NONE",
+ "rollup": false
+ }
+ },
+ "tuningConfig": {
+ "type": "kafka",
+ "reportParseExceptions": false
+ },
+ "ioConfig": {
+ "topic": "wikipedia",
+ "replicas": 2,
+ "taskDuration": "PT10M",
+ "completionTimeout": "PT20M",
+ "consumerProperties": {
+ "bootstrap.servers": "localhost:9092"
+ }
+ }
+}
+```
+
+This will start the supervisor that will in turn spawn some tasks that will
start listening for incoming data.
+
+
-We will use Druid's Kafka indexing service to ingest messages from our newly
created *wikipedia* topic. To start the
-service, we will need to submit a supervisor spec to the Druid overlord by
running the following from the Druid package root:
+### Submit a supervisor directly
+
+To start the service directly, we will need to submit a supervisor spec to the
Druid overlord by running the following from the Druid package root:
```bash
curl -XPOST -H'Content-Type: application/json' -d
@quickstart/tutorial/wikipedia-kafka-supervisor.json
http://localhost:8081/druid/indexer/v1/supervisor
@@ -73,9 +150,10 @@ For more details about what's going on here, check out the
You can view the current supervisors and tasks in the Druid Console:
[http://localhost:8888/unified-console.html#tasks](http://localhost:8888/unified-console.html#tasks).
+
## Load data
-Let's launch a console producer for our topic and send some data!
+Let's launch a producer for our topic and send some data!
In your Druid directory, run the following command:
diff --git a/docs/content/tutorials/tutorial-query.md
b/docs/content/tutorials/tutorial-query.md
index 9829197..960655e 100644
--- a/docs/content/tutorials/tutorial-query.md
+++ b/docs/content/tutorials/tutorial-query.md
@@ -24,7 +24,7 @@ title: "Tutorial: Querying data"
# Tutorial: Querying data
-This tutorial will demonstrate how to query data in Apache Druid (incubating),
with examples for Druid's native query format and Druid SQL.
+This tutorial will demonstrate how to query data in Apache Druid (incubating),
with examples for Druid SQL and Druid's native query format.
The tutorial assumes that you've already completed one of the 4 ingestion
tutorials, as we will be querying the sample Wikipedia edits data.
@@ -33,91 +33,80 @@ The tutorial assumes that you've already completed one of
the 4 ingestion tutori
* [Tutorial: Loading a file using
Hadoop](../tutorials/tutorial-batch-hadoop.html)
* [Tutorial: Loading stream data using
Tranquility](../tutorials/tutorial-tranquility.html)
-## Native JSON queries
+Druid queries are sent over HTTP.
+The Druid console includes a view to issue queries to Druid and nicely format
the results.
-Druid's native query format is expressed in JSON. We have included a sample
native TopN query under `quickstart/tutorial/wikipedia-top-pages.json`:
+## Druid SQL queries
-```json
-{
- "queryType" : "topN",
- "dataSource" : "wikipedia",
- "intervals" : ["2015-09-12/2015-09-13"],
- "granularity" : "all",
- "dimension" : "page",
- "metric" : "count",
- "threshold" : 10,
- "aggregations" : [
- {
- "type" : "count",
- "name" : "count"
- }
- ]
-}
-```
+Druid supports a dialect of SQL for querying.
This query retrieves the 10 Wikipedia pages with the most page edits on
2015-09-12.
-Let's submit this query to the Druid Broker:
-
-```bash
-curl -X 'POST' -H 'Content-Type:application/json' -d
@quickstart/tutorial/wikipedia-top-pages.json
http://localhost:8082/druid/v2?pretty
+```sql
+SELECT page, COUNT(*) AS Edits
+FROM wikipedia
+WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP
'2015-09-13 00:00:00'
+GROUP BY page ORDER BY Edits DESC
+LIMIT 10
```
-You should see the following query results:
+Let's look at the different ways to issue this query.
-```json
-[ {
- "timestamp" : "2015-09-12T00:46:58.771Z",
- "result" : [ {
- "count" : 33,
- "page" : "Wikipedia:Vandalismusmeldung"
- }, {
- "count" : 28,
- "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
- }, {
- "count" : 27,
- "page" : "Jeremy Corbyn"
- }, {
- "count" : 21,
- "page" : "Wikipedia:Administrators' noticeboard/Incidents"
- }, {
- "count" : 20,
- "page" : "Flavia Pennetta"
- }, {
- "count" : 18,
- "page" : "Total Drama Presents: The Ridonculous Race"
- }, {
- "count" : 18,
- "page" : "User talk:Dudeperson176123"
- }, {
- "count" : 18,
- "page" : "Wikipédia:Le Bistro/12 septembre 2015"
- }, {
- "count" : 17,
- "page" : "Wikipedia:In the news/Candidates"
- }, {
- "count" : 17,
- "page" : "Wikipedia:Requests for page protection"
- } ]
-} ]
-```
+### Query SQL via the console
-## Druid SQL queries
+You can issue the above query from the console.
+
+
+
+The console query view provides autocomplete together with inline function
documentation.
+You can also configure extra context flags to be sent with the query from the
more options menu.
+
+
+
+Note that the console will by default wrap your SQL queries in a limit so that
you can issue queries like `SELECT * FROM wikipedia` without much hesitation -
you can turn off this behaviour.
-Druid also supports a dialect of SQL for querying. Let's run a SQL query that
is equivalent to the native JSON query shown above:
+### Query SQL via dsql
+For convenience, the Druid package includes a SQL command-line client, located
at `bin/dsql` from the Druid package root.
+
+Let's now run `bin/dsql`; you should see the following prompt:
+
+```bash
+Welcome to dsql, the command-line client for Druid SQL.
+Type "\h" for help.
+dsql>
```
-SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP
'2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER
BY Edits DESC LIMIT 10;
+
+To submit the query, paste it to the `dsql` prompt and press enter:
+
+```bash
+dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN
TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY
page ORDER BY Edits DESC LIMIT 10;
+┌──────────────────────────────────────────────────────────┬───────┐
+│ page │ Edits │
+├──────────────────────────────────────────────────────────┼───────┤
+│ Wikipedia:Vandalismusmeldung │ 33 │
+│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
+│ Jeremy Corbyn │ 27 │
+│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
+│ Flavia Pennetta │ 20 │
+│ Total Drama Presents: The Ridonculous Race │ 18 │
+│ User talk:Dudeperson176123 │ 18 │
+│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
+│ Wikipedia:In the news/Candidates │ 17 │
+│ Wikipedia:Requests for page protection │ 17 │
+└──────────────────────────────────────────────────────────┴───────┘
+Retrieved 10 rows in 0.06s.
```
-The SQL queries are submitted as JSON over HTTP.
-### TopN query example
+### Query SQL over HTTP
+
+The SQL queries are submitted as JSON over HTTP.
The tutorial package includes an example file that contains the SQL query
shown above at `quickstart/tutorial/wikipedia-top-pages-sql.json`. Let's submit
that query to the Druid Broker:
```bash
-curl -X 'POST' -H 'Content-Type:application/json' -d
@quickstart/tutorial/wikipedia-top-pages-sql.json
http://localhost:8082/druid/v2/sql
+curl -X 'POST' -H 'Content-Type:application/json' -d
@quickstart/tutorial/wikipedia-top-pages-sql.json
http://localhost:8888/druid/v2/sql
```
The following results should be returned:
@@ -167,119 +156,51 @@ The following results should be returned:
]
```
-### dsql client
+### More Druid SQL examples
-For convenience, the Druid package includes a SQL command-line client, located
at `bin/dsql` from the Druid package root.
-
-Let's now run `bin/dsql`; you should see the following prompt:
-
-```bash
-Welcome to dsql, the command-line client for Druid SQL.
-Type "\h" for help.
-dsql>
-```
+Here is a collection of queries to try out:
-To submit the query, paste it to the `dsql` prompt and press enter:
+#### Query over time
-```bash
-dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN
TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY
page ORDER BY Edits DESC LIMIT 10;
-┌──────────────────────────────────────────────────────────┬───────┐
-│ page │ Edits │
-├──────────────────────────────────────────────────────────┼───────┤
-│ Wikipedia:Vandalismusmeldung │ 33 │
-│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
-│ Jeremy Corbyn │ 27 │
-│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
-│ Flavia Pennetta │ 20 │
-│ Total Drama Presents: The Ridonculous Race │ 18 │
-│ User talk:Dudeperson176123 │ 18 │
-│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
-│ Wikipedia:In the news/Candidates │ 17 │
-│ Wikipedia:Requests for page protection │ 17 │
-└──────────────────────────────────────────────────────────┴───────┘
-Retrieved 10 rows in 0.06s.
+```sql
+SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
+FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND
TIMESTAMP '2015-09-13 00:00:00'
+GROUP BY 1
```
-### Additional Druid SQL queries
-
-#### Timeseries
+
-`SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM
wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP
'2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);`
+#### General group by
-```bash
-dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND
TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
-┌──────────────────────────┬──────────────┐
-│ HourTime │ LinesDeleted │
-├──────────────────────────┼──────────────┤
-│ 2015-09-12T00:00:00.000Z │ 1761 │
-│ 2015-09-12T01:00:00.000Z │ 16208 │
-│ 2015-09-12T02:00:00.000Z │ 14543 │
-│ 2015-09-12T03:00:00.000Z │ 13101 │
-│ 2015-09-12T04:00:00.000Z │ 12040 │
-│ 2015-09-12T05:00:00.000Z │ 6399 │
-│ 2015-09-12T06:00:00.000Z │ 9036 │
-│ 2015-09-12T07:00:00.000Z │ 11409 │
-│ 2015-09-12T08:00:00.000Z │ 11616 │
-│ 2015-09-12T09:00:00.000Z │ 17509 │
-│ 2015-09-12T10:00:00.000Z │ 19406 │
-│ 2015-09-12T11:00:00.000Z │ 16284 │
-│ 2015-09-12T12:00:00.000Z │ 18672 │
-│ 2015-09-12T13:00:00.000Z │ 30520 │
-│ 2015-09-12T14:00:00.000Z │ 18025 │
-│ 2015-09-12T15:00:00.000Z │ 26399 │
-│ 2015-09-12T16:00:00.000Z │ 24759 │
-│ 2015-09-12T17:00:00.000Z │ 19634 │
-│ 2015-09-12T18:00:00.000Z │ 17345 │
-│ 2015-09-12T19:00:00.000Z │ 19305 │
-│ 2015-09-12T20:00:00.000Z │ 22265 │
-│ 2015-09-12T21:00:00.000Z │ 16394 │
-│ 2015-09-12T22:00:00.000Z │ 16379 │
-│ 2015-09-12T23:00:00.000Z │ 15289 │
-└──────────────────────────┴──────────────┘
-Retrieved 24 rows in 0.08s.
+```sql
+SELECT channel, page, SUM(added)
+FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND
TIMESTAMP '2015-09-13 00:00:00'
+GROUP BY channel, page
+ORDER BY SUM(added) DESC
```
-#### GroupBy
+
-`SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP
'2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel
ORDER BY SUM(added) DESC LIMIT 5;`
+#### Select raw data
-```bash
-dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN
TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY
channel ORDER BY SUM(added) DESC LIMIT 5;
-┌───────────────┬─────────┐
-│ channel │ EXPR$1 │
-├───────────────┼─────────┤
-│ #en.wikipedia │ 3045299 │
-│ #it.wikipedia │ 711011 │
-│ #fr.wikipedia │ 642555 │
-│ #ru.wikipedia │ 640698 │
-│ #es.wikipedia │ 634670 │
-└───────────────┴─────────┘
-Retrieved 5 rows in 0.05s.
+```sql
+SELECT user, page
+FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND
TIMESTAMP '2015-09-12 03:00:00'
+LIMIT 5
```
-#### Scan
+
-` SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP
'2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;`
+### Explain query plan
-```bash
- dsql> SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP
'2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;
-┌────────────────────────┬────────────────────────────────────────────────────────┐
-│ user │ page
│
-├────────────────────────┼────────────────────────────────────────────────────────┤
-│ Thiago89 │ Campeonato Mundial de Voleibol Femenino Sub-20 de
2015 │
-│ 91.34.200.249 │ Friede von Schönbrunn
│
-│ TuHan-Bot │ Trĩ vàng
│
-│ Lowercase sigmabot III │ User talk:ErrantX
│
-│ BattyBot │ Hans W. Jung
│
-└────────────────────────┴────────────────────────────────────────────────────────┘
-Retrieved 5 rows in 0.04s.
-```
+Druid SQL has the ability to explain the query plan for a given query.
+In the console this functionality is accessible from the `...` button.
-#### EXPLAIN PLAN FOR
+
-By prepending `EXPLAIN PLAN FOR ` to a Druid SQL query, it is possible to see
what native Druid queries a SQL query will plan into.
+If you are querying in other ways you can get the plan by prepending `EXPLAIN
PLAN FOR ` to a Druid SQL query.
-Using the TopN query above as an example:
+Using a query from an example above:
`EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time"
BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY page ORDER BY Edits DESC LIMIT 10;`
@@ -293,6 +214,90 @@ dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM
wikipedia WHERE "__ti
Retrieved 1 row in 0.03s.
```
+
+## Native JSON queries
+
+Druid's native query format is expressed in JSON.
+
+### Native query via the console
+
+You can issue native Druid queries from the console's Query view.
+
+Here is a query that retrieves the 10 Wikipedia pages with the most page edits
on 2015-09-12.
+
+```json
+{
+ "queryType" : "topN",
+ "dataSource" : "wikipedia",
+ "intervals" : ["2015-09-12/2015-09-13"],
+ "granularity" : "all",
+ "dimension" : "page",
+ "metric" : "count",
+ "threshold" : 10,
+ "aggregations" : [
+ {
+ "type" : "count",
+ "name" : "count"
+ }
+ ]
+}
+```
+
+Simply paste it into the console to switch the editor into JSON mode.
+
+
+
+
+### Native queries over HTTP
+
+We have included a sample native TopN query under
`quickstart/tutorial/wikipedia-top-pages.json`:
+
+Let's submit this query to Druid:
+
+```bash
+curl -X 'POST' -H 'Content-Type:application/json' -d
@quickstart/tutorial/wikipedia-top-pages.json
http://localhost:8888/druid/v2?pretty
+```
+
+You should see the following query results:
+
+```json
+[ {
+ "timestamp" : "2015-09-12T00:46:58.771Z",
+ "result" : [ {
+ "count" : 33,
+ "page" : "Wikipedia:Vandalismusmeldung"
+ }, {
+ "count" : 28,
+ "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
+ }, {
+ "count" : 27,
+ "page" : "Jeremy Corbyn"
+ }, {
+ "count" : 21,
+ "page" : "Wikipedia:Administrators' noticeboard/Incidents"
+ }, {
+ "count" : 20,
+ "page" : "Flavia Pennetta"
+ }, {
+ "count" : 18,
+ "page" : "Total Drama Presents: The Ridonculous Race"
+ }, {
+ "count" : 18,
+ "page" : "User talk:Dudeperson176123"
+ }, {
+ "count" : 18,
+ "page" : "Wikipédia:Le Bistro/12 septembre 2015"
+ }, {
+ "count" : 17,
+ "page" : "Wikipedia:In the news/Candidates"
+ }, {
+ "count" : 17,
+ "page" : "Wikipedia:Requests for page protection"
+ } ]
+} ]
+```
+
+
## Further reading
The [Queries documentation](../querying/querying.html) has more information on
Druid's native JSON queries.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]