surekhasaharan commented on a change in pull request #6126: New quickstart and 
tutorials
URL: https://github.com/apache/incubator-druid/pull/6126#discussion_r209071989
 
 

 ##########
 File path: docs/content/tutorials/tutorial-compaction.md
 ##########
 @@ -0,0 +1,103 @@
+---
+layout: doc_page
+---
+
+# Tutorial: Compacting segments
+
+This tutorial demonstrates how to compact existing segments into fewer but 
larger segments.
+
+For this tutorial, we'll assume you've already downloaded Druid as described 
in 
+the [single-machine quickstart](index.html) and have it running on your local 
machine. 
+
+It will also be helpful to have finished [Tutorial: Loading a 
file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying 
data](/docs/VERSION/tutorials/tutorial-query.html).
+
+## Load the initial data
+
+For this tutorial, we'll be using the Wikipedia edits sample data, with an 
ingestion task spec that will create a separate segment for each hour in the 
input data.
+
+The ingestion spec can be found at 
`quickstart/tutorial/compaction-init-index.json`. Let's submit that spec, which 
will create a datasource called `compaction-tutorial`:
+
+```
+bin/post-index-task --file quickstart/tutorial/compaction-init-index.json 
+```
+
+After the ingestion completes, go to 
http://localhost:8081/#/datasources/compaction-tutorial in a browser to view 
information about the new datasource in the Coordinator console.
+
+There will be 24 segments for this datasource, one segment per hour in the 
input data:
+
+![Original segments](../tutorials/img/tutorial-retention-01.png "Original 
segments")
+
+Running a COUNT(*) query on this datasource shows that there are 24,433 rows:
+
+```
+dsql> select count(*) from "compaction-tutorial";
+┌────────┐
+│ EXPR$0 │
+├────────┤
+│  39244 │
+└────────┘
+Retrieved 1 row in 1.38s.
+```
+
+## Compact the data
+
+Let's now combine these 22 segments into one segment.
 
 Review comment:
   Not a comment on docs, but I didn't understand how it's 22 segments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to