surekhasaharan commented on a change in pull request #6126: New quickstart and tutorials URL: https://github.com/apache/incubator-druid/pull/6126#discussion_r209071989
########## File path: docs/content/tutorials/tutorial-compaction.md ########## @@ -0,0 +1,103 @@ +--- +layout: doc_page +--- + +# Tutorial: Compacting segments + +This tutorial demonstrates how to compact existing segments into fewer but larger segments. + +For this tutorial, we'll assume you've already downloaded Druid as described in +the [single-machine quickstart](index.html) and have it running on your local machine. + +It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html). + +## Load the initial data + +For this tutorial, we'll be using the Wikipedia edits sample data, with an ingestion task spec that will create a separate segment for each hour in the input data. + +The ingestion spec can be found at `quickstart/tutorial/compaction-init-index.json`. Let's submit that spec, which will create a datasource called `compaction-tutorial`: + +``` +bin/post-index-task --file quickstart/tutorial/compaction-init-index.json +``` + +After the ingestion completes, go to http://localhost:8081/#/datasources/compaction-tutorial in a browser to view information about the new datasource in the Coordinator console. + +There will be 24 segments for this datasource, one segment per hour in the input data: + +![Original segments](../tutorials/img/tutorial-retention-01.png "Original segments") + +Running a COUNT(*) query on this datasource shows that there are 24,433 rows: + +``` +dsql> select count(*) from "compaction-tutorial"; +┌────────┐ +│ EXPR$0 │ +├────────┤ +│ 39244 │ +└────────┘ +Retrieved 1 row in 1.38s. +``` + +## Compact the data + +Let's now combine these 22 segments into one segment. Review comment: Not a comment on docs, but I didn't understand how it's 22 segments. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org