[GitHub] [incubator-druid] fjy commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs

GitBox Tue, 11 Jun 2019 13:05:34 -0700

fjy commented on a change in pull request #7863: Added the web console to the 
quickstart tutorials and docs
URL: https://github.com/apache/incubator-druid/pull/7863#discussion_r292636366


 ##########
 File path: docs/content/tutorials/tutorial-batch.md
 ##########
 @@ -24,18 +24,98 @@ title: "Tutorial: Loading a file"
 
 # Tutorial: Loading a file
 
-## Getting started
-
 This tutorial demonstrates how to perform a batch file load, using Apache 
Druid (incubating)'s native batch ingestion.
 
 For this tutorial, we'll assume you've already downloaded Druid as described 
in 
 the [quickstart](index.html) using the `micro-quickstart` single-machine 
configuration and have it
 running on your local machine. You don't need to have loaded any data yet.
 
-## Preparing the data and the ingestion task spec
-
 A data load is initiated by submitting an *ingestion task* spec to the Druid 
Overlord. For this tutorial, we'll be loading the sample Wikipedia page edits 
data.
 
+An ingestion spec can be written by hand or you can use the "Data loader" that 
it built into the Druid console to help iteratively build one for you by 
sampling your data.
+The data loader currently only supports native batch ingestion (streaming 
support coming soon) so we can use it for this tutorial.
+
+We've included a sample of Wikipedia edits from September 12, 2015 to get you 
started.
+
+
+## Loading data with the data loader
+
+Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in 
the console header, select `Local disk`.
+
+![Data loader init](../tutorials/img/tutorial-batch-data-loader-01.png "Data 
loader init")
+
+Enter the value of `quickstart/tutorial/` as the base directory and 
`wikiticker-2015-09-12-sampled.json.gz` as a filter.
+The separation of base directory and filter is there if you need to ingest 
data from multiple files.
+
+Click `Preview` and make sure that the the data you are seeing is correct.
+
+![Data loader sample](../tutorials/img/tutorial-batch-data-loader-02.png "Data 
loader sample")
+
+Once the data is located you can click "Next: Parse data" to go to the next 
step.
+The data loader will try to automatically determine the correct parser for the 
data.
+In this case it will successfully determine `json`.
+Feel free to play around with different parser options to get a preview of how 
Druid will parse your data.
+
+![Data loader parse data](../tutorials/img/tutorial-batch-data-loader-03.png 
"Data loader parse data")
+
+With the `json` parser selected click `Next: Parse time` to get to the step 
centered around determining your primary timestamp column.
+Druid's architecture mandates a primary timestamp column that will be called 
`__time`, which could always just be a `Constant value`.
+In this case the data loader will guess the `time` column as the primary time 
column as it is the only one with values that look like they might be time. 
+
+![Data loader parse time](../tutorials/img/tutorial-batch-data-loader-04.png 
"Data loader parse time")
+
+Click `Next: ...` twice to go past the `Transform` and `Filter` steps, you do 
not need to enter anything there and applying ingestion times transforms and 
filters is out of scope of this tutorial.
+
+In the schema stage you can configure which dimensions (and metrics) will be 
ingested into Druid.
 
 Review comment:
   In the schema stage**, you** 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] fjy commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs

Reply via email to