[GitHub] [incubator-druid] fjy commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs

GitBox Mon, 17 Jun 2019 16:39:50 -0700

fjy commented on a change in pull request #7863: Added the web console to the 
quickstart tutorials and docs
URL: https://github.com/apache/incubator-druid/pull/7863#discussion_r294559052

##########
File path: docs/content/tutorials/tutorial-batch.md
##########
@@ -24,18 +24,99 @@ title: "Tutorial: Loading a file"

# Tutorial: Loading a file

-## Getting started
-
This tutorial demonstrates how to perform a batch file load, using Apache
Druid (incubating)'s native batch ingestion.

For this tutorial, we'll assume you've already downloaded Druid as described
in
the [quickstart](index.html) using the `micro-quickstart` single-machine
configuration and have it
running on your local machine. You don't need to have loaded any data yet.

-## Preparing the data and the ingestion task spec
-
A data load is initiated by submitting an *ingestion task* spec to the Druid
Overlord. For this tutorial, we'll be loading the sample Wikipedia page edits
data.

+An ingestion spec can be written by hand or you can use the "Data loader" that
is built into the Druid console to help iteratively build one for you by
sampling your data.
+The data loader currently only supports native batch ingestion (streaming
support coming soon).
+
+We've included a sample of Wikipedia edits from September 12, 2015 to get you
started.
+
+
+## Loading data with the data loader
+
+Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in
the console header, select `Local disk`.
+
+![Data loader init](../tutorials/img/tutorial-batch-data-loader-01.png "Data
loader init")
+
+Enter the value of `quickstart/tutorial/` as the base directory and
`wikiticker-2015-09-12-sampled.json.gz` as a filter.
+The separation of base directory and [wildcard file
filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html)
is there if you need to ingest data from multiple files.
+
+Click `Preview` and make sure that the the data you are seeing is correct.
+
+![Data loader sample](../tutorials/img/tutorial-batch-data-loader-02.png "Data
loader sample")
+
+Once the data is located, you can click "Next: Parse data" to go to the next
step.
+The data loader will try to automatically determine the correct parser for the
data.
+In this case it will successfully determine `json`.
+Feel free to play around with different parser options to get a preview of how
Druid will parse your data.
+
+![Data loader parse data](../tutorials/img/tutorial-batch-data-loader-03.png
"Data loader parse data")
+
+With the `json` parser selected click `Next: Parse time` to get to the step
centered around determining your primary timestamp column.
+Druid's architecture mandates a primary timestamp column that will be called
`__time`, which could always just be a `Constant value`.
+In this case the data loader will guess the `time` column as the primary time
column as it is the only one with values that look like they might be time.
+
+![Data loader parse time](../tutorials/img/tutorial-batch-data-loader-04.png
"Data loader parse time")
+
+Click `Next: ...` twice to go past the `Transform` and `Filter` steps.
+You do not need to enter anything there and applying ingestion time transforms
and filters are out of scope of this tutorial.
+
+In the `Configure schema` step, you can configure which dimensions (and
metrics) will be ingested into Druid.
+This is exactly what the data will appear like in Druid once it is ingested.
+Since our dataset is very small go ahead and turn off `Rollup` by clicking on
the switch and confirming the change.
+
+![Data loader schema](../tutorials/img/tutorial-batch-data-loader-05.png "Data
loader schema")
+
+Once you are satisfied with the schema, click `Next` to go to the `Partition`
step where you can fine tune how the data will be partitioned into segments.
+Here you can adjust how the data will be split up into segments in Druid.
+Since this is such a small dataset there are no adjustments that need to be
made in this step.
+
+![Data loader partition](../tutorials/img/tutorial-batch-data-loader-06.png
"Data loader partition")
+
+Clicking past the `Tune` step we get to the publish step which is where we can
specify what the datasource will be called in Druid.
+Let's name this datasource `wikipedia`.
+
+![Data loader publish](../tutorials/img/tutorial-batch-data-loader-07.png
"Data loader publish")
+
+Finally, click `Next` to review your spec.
+This is the spec you have constructed.
+Feel free to go back to previous steps and see how making changes there will
manifest itself in the spec.
+Similarly you can also edit the spec directly and see it reflected in the
other steps.
+
+![Data loader spec](../tutorials/img/tutorial-batch-data-loader-08.png "Data
loader spec")
+
+Once you are satisfied with the spec, click `Submit` and an ingestion task
will be created.
+
+You will be taken to the task view with the focus on the newly created task.
+
+![Tasks view](../tutorials/img/tutorial-batch-data-loader-09.png "Tasks view")
+
+In the tasks view you can click `Refresh` a couple of times until your
ingestion task (hopefully) succeeds.
+
+When a tasks succeeds it means that it built one or more segments that will
now be picked up by the data servers.
+
+Navigate to the `Datasources` view and click refresh until your datasource
(`wikipedia`) appears.
+This could take a few seconds as the segments are being loaded.
+
+![Datasource view](../tutorials/img/tutorial-batch-data-loader-10.png
"Datasource view")
+
+Once you see the datasource there with a green (fully available) circle, you
can go to the `Query` view to run SQL queries against this datasource.

Review comment:
Once you see the datasource**,**there

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] fjy commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs

Reply via email to