weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414964427
##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
[Fri May 3 11:40:50 2019] Running command[middleManager], logging
to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid
middleManager conf/druid/single-server/micro-quickstart
```
-All persistent state such as the cluster metadata store and segments for the
services will be kept in the `var` directory under the
apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located
at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the
services, are kept in the `var` directory under
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes
to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by
deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after
experimentation, to start with a fresh instance.
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the
`bin/start-micro-quickstart` script and
+terminates all Druid processes.
+
-Later on, if you'd like to stop the services, CTRL-C to exit the
`bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console
-Once the cluster has started, you can navigate to
[http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid
console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid
console](../operations/druid-console.md) at
[http://localhost:8888](http://localhost:8888).

-It takes a few seconds for all the Druid processes to fully start up. If you
open the console immediately after starting the services, you may see some
errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file
containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at
`quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid
package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
- * added
- * channel
- * cityName
- * comment
- * countryIsoCode
- * countryName
- * deleted
- * delta
- * isAnonymous
- * isMinor
- * isNew
- * isRobot
- * isUnpatrolled
- * metroCode
- * namespace
- * page
- * regionIsoCode
- * regionName
- * user
-
-```json
-{
- "timestamp":"2015-09-12T20:03:45.018Z",
- "channel":"#en.wikipedia",
- "namespace":"Main",
- "page":"Spider-Man's powers and equipment",
- "user":"foobar",
- "comment":"/* Artificial web-shooters */",
- "cityName":"New York",
- "regionName":"New York",
- "regionIsoCode":"NY",
- "countryName":"United States",
- "countryIsoCode":"US",
- "isAnonymous":false,
- "isNew":false,
- "isMinor":false,
- "isRobot":false,
- "isUnpatrolled":false,
- "added":99,
- "delta":99,
- "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including
the [Druid router](../design/router.md), which serves the console. If you
attempt to open the Druid console before startup is complete, you may see
errors in the browser. Wait a few moments and try again.
-### Data loading tutorials
+## Step 4. Load data
-The following tutorials demonstrate various methods of loading data into
Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine
configuration mentioned above.
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to
perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial
demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this
tutorial demonstrates how to perform a batch file load, using a remote Hadoop
cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this
tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can
write ingestion specs by hand or using the _data loader_,
+as we will do here.
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents
Wikipedia page edits on a given day.
Review comment:
I think it's worthwhile to mention the data file that we will be loading
here. The original version has this part "This sample data is located at
quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz from the Druid
package root.". That makes it a bit more clear before diving into base
directory and file filer. I agree that the part talking about columns can go.
I felt it was a bit too much even when I first read it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]