weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414966246
##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
[Fri May 3 11:40:50 2019] Running command[middleManager], logging
to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid
middleManager conf/druid/single-server/micro-quickstart
```
-All persistent state such as the cluster metadata store and segments for the
services will be kept in the `var` directory under the
apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located
at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the
services, are kept in the `var` directory under
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes
to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by
deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after
experimentation, to start with a fresh instance.
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the
`bin/start-micro-quickstart` script and
+terminates all Druid processes.
+
-Later on, if you'd like to stop the services, CTRL-C to exit the
`bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console
-Once the cluster has started, you can navigate to
[http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid
console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid
console](../operations/druid-console.md) at
[http://localhost:8888](http://localhost:8888).

-It takes a few seconds for all the Druid processes to fully start up. If you
open the console immediately after starting the services, you may see some
errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file
containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at
`quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid
package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
- * added
- * channel
- * cityName
- * comment
- * countryIsoCode
- * countryName
- * deleted
- * delta
- * isAnonymous
- * isMinor
- * isNew
- * isRobot
- * isUnpatrolled
- * metroCode
- * namespace
- * page
- * regionIsoCode
- * regionName
- * user
-
-```json
-{
- "timestamp":"2015-09-12T20:03:45.018Z",
- "channel":"#en.wikipedia",
- "namespace":"Main",
- "page":"Spider-Man's powers and equipment",
- "user":"foobar",
- "comment":"/* Artificial web-shooters */",
- "cityName":"New York",
- "regionName":"New York",
- "regionIsoCode":"NY",
- "countryName":"United States",
- "countryIsoCode":"US",
- "isAnonymous":false,
- "isNew":false,
- "isMinor":false,
- "isRobot":false,
- "isUnpatrolled":false,
- "added":99,
- "delta":99,
- "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including
the [Druid router](../design/router.md), which serves the console. If you
attempt to open the Druid console before startup is complete, you may see
errors in the browser. Wait a few moments and try again.
-### Data loading tutorials
+## Step 4. Load data
-The following tutorials demonstrate various methods of loading data into
Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine
configuration mentioned above.
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to
perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial
demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this
tutorial demonstrates how to perform a batch file load, using a remote Hadoop
cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this
tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can
write ingestion specs by hand or using the _data loader_,
+as we will do here.
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents
Wikipedia page edits on a given day.
-If you want a clean start after stopping the services, delete the `var`
directory and run the `bin/start-micro-quickstart` script again.
+1. Click **Load data** from the Druid console header ().
-Once every service has started, you are now ready to load data.
+2. Select the **Local disk** tile and then click **Connect data**.
-#### Resetting Kafka
+ 
+
+3. Enter the following values:
+
+ - **Base directory**: `quickstart/tutorial/`
+
+ - **File filter**: `wikiticker-2015-09-12-sampled.json.gz`
+
+ 
+
+ Entering the base directory and [wildcard file
filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html)
separately, as afforded by the UI, allows you to specify multiple files for
ingestion at once.
+
+4. Click **Apply**.
+
+ The data loader displays the raw data, giving you a chance to verify that
the data
+ appears as expected.
+
+ 
+
+ Notice that your position in the sequence of steps to load data,
**Connect** in our case, appears at the top of the console, as shown below.
+ You can click other steps to move forward or backward in the sequence at
any time.
+
+ 
+
+
+5. Click **Next: Parse data**.
+
+ The data loader tries to determine the parser appropriate for the data
format automatically. In this case
+ it identifies the data format as `json`, as shown in the **Input format**
field at the bottom right.
+
+ 
+
+ Feel free to select other **Input format** options to get a sense of their
configuration settings
+ and how Druid parses other types of data.
+
+6. With the JSON parser selected, click **Next: Parse time**. The **Parse
time** settings are where you view and adjust the
+ primary timestamp column for the data.
+
+ 
+
+ Druid requires data to have a primary timestamp column (internally stored
in a column called `__time`).
+ If you do not have a timestamp in your data, select `Constant value`. In
our example, the data loader
+ determines that the `time` column is the only candidate that can be used as
the primary time column.
+
+7. Click **Next: Transform**, **Next: Filter**, and then **Next: Configure
schema**, skipping a few steps.
+
+ You do not need to adjust transformation or filtering settings, as applying
ingestion time transforms and
+ filters are out of scope for this tutorial.
+
+8. The Configure schema settings are where you configure what
[dimensions](../ingestion/index.md#dimensions)
+ and [metrics](../ingestion/index.md#metrics) are ingested. The outcome of
this configuration represents exactly how the
+ data will appear in Druid after ingestion.
+
+ Since our dataset is very small, you can turn off
[rollup](../ingestion/index.md#rollup)
+ by unsetting the **Rollup** switch and confirming the change when prompted.
+
+ 
+
+
+10. Click **Next: Partition** to configure how the data will be split into
segments. In this case, choose `DAY` as
+ the **Segment Granularity**.
Review comment:
"Segment Granularity" should be "Segment granularity"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]