[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

GitBox Fri, 24 Apr 2020 20:22:40 -0700


weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414964427




##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging 
to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid 
middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the 
services will be kept in the `var` directory under the 
apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located 
at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the 
services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes 
to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by 
deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after 
experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the 
`bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the 
`bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to 
[http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid 
console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid 
console](../operations/druid-console.md) at 
[http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you 
open the console immediately after starting the services, you may see some 
errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file 
containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at 
`quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid 
package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including 
the [Druid router](../design/router.md), which serves the console. If you 
attempt to open the Druid console before startup is complete, you may see 
errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into 
Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine 
configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to 
perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial 
demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this 
tutorial demonstrates how to perform a batch file load, using a remote Hadoop 
cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this 
tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can 
write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents 
Wikipedia page edits on a given day. 

Review comment:
       I think it's worthwhile to mention the data file that we will be loading 
here.  The original version has this part "This sample data is located at 
quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz from the Druid 
package root.".  That makes it a bit more clear before diving into base 
directory and file filer.  I agree that the part talking about columns can go.  
I felt it was a bit too much even when I first read it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Reply via email to