fjy commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs URL: https://github.com/apache/incubator-druid/pull/7863#discussion_r294559052
########## File path: docs/content/tutorials/tutorial-batch.md ########## @@ -24,18 +24,99 @@ title: "Tutorial: Loading a file" # Tutorial: Loading a file -## Getting started - This tutorial demonstrates how to perform a batch file load, using Apache Druid (incubating)'s native batch ingestion. For this tutorial, we'll assume you've already downloaded Druid as described in the [quickstart](index.html) using the `micro-quickstart` single-machine configuration and have it running on your local machine. You don't need to have loaded any data yet. -## Preparing the data and the ingestion task spec - A data load is initiated by submitting an *ingestion task* spec to the Druid Overlord. For this tutorial, we'll be loading the sample Wikipedia page edits data. +An ingestion spec can be written by hand or you can use the "Data loader" that is built into the Druid console to help iteratively build one for you by sampling your data. +The data loader currently only supports native batch ingestion (streaming support coming soon). + +We've included a sample of Wikipedia edits from September 12, 2015 to get you started. + + +## Loading data with the data loader + +Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in the console header, select `Local disk`. + + + +Enter the value of `quickstart/tutorial/` as the base directory and `wikiticker-2015-09-12-sampled.json.gz` as a filter. +The separation of base directory and [wildcard file filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) is there if you need to ingest data from multiple files. + +Click `Preview` and make sure that the the data you are seeing is correct. + + + +Once the data is located, you can click "Next: Parse data" to go to the next step. +The data loader will try to automatically determine the correct parser for the data. +In this case it will successfully determine `json`. +Feel free to play around with different parser options to get a preview of how Druid will parse your data. + + + +With the `json` parser selected click `Next: Parse time` to get to the step centered around determining your primary timestamp column. +Druid's architecture mandates a primary timestamp column that will be called `__time`, which could always just be a `Constant value`. +In this case the data loader will guess the `time` column as the primary time column as it is the only one with values that look like they might be time. + + + +Click `Next: ...` twice to go past the `Transform` and `Filter` steps. +You do not need to enter anything there and applying ingestion time transforms and filters are out of scope of this tutorial. + +In the `Configure schema` step, you can configure which dimensions (and metrics) will be ingested into Druid. +This is exactly what the data will appear like in Druid once it is ingested. +Since our dataset is very small go ahead and turn off `Rollup` by clicking on the switch and confirming the change. + + + +Once you are satisfied with the schema, click `Next` to go to the `Partition` step where you can fine tune how the data will be partitioned into segments. +Here you can adjust how the data will be split up into segments in Druid. +Since this is such a small dataset there are no adjustments that need to be made in this step. + + + +Clicking past the `Tune` step we get to the publish step which is where we can specify what the datasource will be called in Druid. +Let's name this datasource `wikipedia`. + + + +Finally, click `Next` to review your spec. +This is the spec you have constructed. +Feel free to go back to previous steps and see how making changes there will manifest itself in the spec. +Similarly you can also edit the spec directly and see it reflected in the other steps. + + + +Once you are satisfied with the spec, click `Submit` and an ingestion task will be created. + +You will be taken to the task view with the focus on the newly created task. + + + +In the tasks view you can click `Refresh` a couple of times until your ingestion task (hopefully) succeeds. + +When a tasks succeeds it means that it built one or more segments that will now be picked up by the data servers. + +Navigate to the `Datasources` view and click refresh until your datasource (`wikipedia`) appears. +This could take a few seconds as the segments are being loaded. + + + +Once you see the datasource there with a green (fully available) circle, you can go to the `Query` view to run SQL queries against this datasource. Review comment: Once you see the datasource**,**there ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
