(druid) branch master updated: Revamp design page (#15486)

techdocsmith Fri, 08 Dec 2023 11:40:35 -0800

This is an automated email from the ASF dual-hosted git repository.

techdocsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 355c800108b Revamp design page (#15486)
355c800108b is described below

commit 355c800108be323ec986589123383709bf4652df
Author: Katya Macedo <[email protected]>
AuthorDate: Fri Dec 8 13:40:24 2023 -0600

    Revamp design page (#15486)
    
    Co-authored-by: Victoria Lim <[email protected]>
---
 docs/api-reference/legacy-metadata-api.md |   2 +-
 docs/assets/druid-architecture.svg        |  19 ++
 docs/configuration/index.md               |   6 +-
 docs/design/architecture.md               | 348 +++++++++---------------------
 docs/design/broker.md                     |  32 ++-
 docs/design/coordinator.md                |  83 ++++---
 docs/design/historical.md                 |  41 ++--
 docs/design/indexer.md                    |  50 ++---
 docs/design/middlemanager.md              |  16 +-
 docs/design/overlord.md                   |  27 ++-
 docs/design/peons.md                      |  18 +-
 docs/design/processes.md                  | 143 ------------
 docs/design/router.md                     |  81 ++++---
 docs/design/storage.md                    | 140 ++++++++++++
 docs/development/experimental-features.md |   6 +-
 docs/development/modules.md               |   2 +-
 docs/ingestion/index.md                   |   2 +-
 docs/querying/query-processing.md         |  48 +++++
 website/.spelling                         |   3 +-
 website/redirects.js                      |   7 +-
 website/sidebars.json                     |  17 +-
 21 files changed, 513 insertions(+), 578 deletions(-)

diff --git a/docs/api-reference/legacy-metadata-api.md 
b/docs/api-reference/legacy-metadata-api.md
index ae75a0b48c0..453159c1a58 100644
--- a/docs/api-reference/legacy-metadata-api.md
+++ b/docs/api-reference/legacy-metadata-api.md
@@ -289,7 +289,7 @@ Returns a list of server data objects in which each object 
has the following key
 
 ## Query server
 
-This section documents the API endpoints for the processes that reside on 
Query servers (Brokers) in the suggested [three-server 
configuration](../design/processes.md#server-types).
+This section documents the API endpoints for the services that reside on Query 
servers (Brokers) in the suggested [three-server 
configuration](../design/architecture.md#druid-servers).
 
 ### Broker
 
diff --git a/docs/assets/druid-architecture.svg 
b/docs/assets/druid-architecture.svg
new file mode 100644
index 00000000000..3f86a412cfa
--- /dev/null
+++ b/docs/assets/druid-architecture.svg
@@ -0,0 +1,19 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:lucid="lucid" width="1967.24" 
height="864.83"><g transform="translate(5341 -19.5)" 
lucid:page-tab-id="0_0"><path d="M-5320 166a6 6 0 0 1 6-6h508a6 6 0 0 1 6 
6v308a6 6 0 0 1-6 6h-508a6 6 0 0 1-6-6zM-4640 166a6 6 0 0 1 6-6h508a6 6 0 0 1 6 
6v308a6 6 0 0 1-6 6h-508a6 6 0 0 1-6-6zM-3960 166a6 6 0 0 1 6-6h508a6 6 0 0 1 6 
6v308a6 6 0 0 1-6 6h-508a6 6 0 0 1-6-6z" stroke="#cfe4ff" stroke-width="2" 
fill="#cfe4 [...]
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index c40af8ca841..3c4ef302420 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -836,7 +836,7 @@ This section contains the configuration options for 
endpoints that are supported
 
 ## Master server
 
-This section contains the configuration options for the services that reside 
on Master servers (Coordinators and Overlords) in the suggested [three-server 
configuration](../design/processes.md#server-types).
+This section contains the configuration options for the services that reside 
on Master servers (Coordinators and Overlords) in the suggested [three-server 
configuration](../design/architecture.md#druid-servers).
 
 ### Coordinator
 
@@ -1393,7 +1393,7 @@ For GCE's properties, please refer to the 
[gce-extensions](../development/extens
 
 ## Data server
 
-This section contains the configuration options for the services that reside 
on Data servers (MiddleManagers/Peons and Historicals) in the suggested 
[three-server configuration](../design/processes.md#server-types).
+This section contains the configuration options for the services that reside 
on Data servers (MiddleManagers/Peons and Historicals) in the suggested 
[three-server configuration](../design/architecture.md#druid-servers).
 
 Configuration options for the [Indexer process](../design/indexer.md) are also 
provided here.
 
@@ -1722,7 +1722,7 @@ See [cache configuration](#cache-configuration) for how 
to configure cache setti
 
 ## Query server
 
-This section contains the configuration options for the processes that reside 
on Query servers (Brokers) in the suggested [three-server 
configuration](../design/processes.md#server-types).
+This section contains the configuration options for the services that reside 
on Query servers (Brokers) in the suggested [three-server 
configuration](../design/architecture.md#druid-servers).
 
 Configuration options for the experimental [Router 
process](../design/router.md) are also provided here.
 
diff --git a/docs/design/architecture.md b/docs/design/architecture.md
index df59dcb25ea..188a0b44164 100644
--- a/docs/design/architecture.md
+++ b/docs/design/architecture.md
@@ -1,6 +1,6 @@
 ---
 id: architecture
-title: "Design"
+title: "Architecture"
 ---
 
 <!--
@@ -23,45 +23,118 @@ title: "Design"
   -->
 
 
-Druid has a distributed architecture that is designed to be cloud-friendly and 
easy to operate. You can configure and scale services independently so you have 
maximum flexibility over cluster operations. This design includes enhanced 
fault tolerance: an outage of one component does not immediately affect other 
components.
+Druid has a distributed architecture that is designed to be cloud-friendly and 
easy to operate. You can configure and scale services independently for maximum 
flexibility over cluster operations. This design includes enhanced fault 
tolerance: an outage of one component does not immediately affect other 
components.
 
-## Druid architecture
+The following diagram shows the services that make up the Druid architecture, 
their typical arrangement across servers, and how queries and data flow through 
this architecture.
 
-The following diagram shows the services that make up the Druid architecture, 
how they are typically organized into servers, and how queries and data flow 
through this architecture.
+![Druid architecture](../assets/druid-architecture.svg)
 
-![Druid architecture](../assets/druid-architecture.png)
-
-The following sections describe the components of this architecture. 
+The following sections describe the components of this architecture.
 
 ## Druid services
 
 Druid has several types of services:
 
-* [**Coordinator**](../design/coordinator.md) service manages data 
availability on the cluster.
-* [**Overlord**](../design/overlord.md) service controls the assignment of 
data ingestion workloads.
-* [**Broker**](../design/broker.md) handles queries from external clients.
-* [**Router**](../design/router.md) services are optional; they route requests 
to Brokers, Coordinators, and Overlords.
-* [**Historical**](../design/historical.md) services store queryable data.
-* [**MiddleManager**](../design/middlemanager.md) services ingest data.
+* [Coordinator](../design/coordinator.md) manages data availability on the 
cluster.
+* [Overlord](../design/overlord.md) controls the assignment of data ingestion 
workloads.
+* [Broker](../design/broker.md) handles queries from external clients.
+* [Router](../design/router.md) routes requests to Brokers, Coordinators, and 
Overlords.
+* [Historical](../design/historical.md) stores queryable data.
+* [MiddleManager](../design/middlemanager.md) and [Peon](../design/peons.md) 
ingest data.
+* [Indexer](../design/indexer.md) serves an alternative to the MiddleManager + 
Peon task execution system.
 
 You can view services in the **Services** tab in the web console: 
 
 ![Druid services](../assets/services-overview.png "Services in the web 
console")
 
-
 ## Druid servers
 
-Druid services can be deployed any way you like, but for ease of deployment we 
suggest organizing them into three server types: Master, Query, and Data.
+You can deploy Druid services according to your preferences. For ease of 
deployment, we recommend organizing them into three server types: 
[Master](#master-server), [Query](#query-server), and [Data](#data-server).
+
+### Master server
+
+A Master server manages data ingestion and availability. It is responsible for 
starting new ingestion jobs and coordinating availability of data on the [Data 
server](#data-server).
+
+Master servers divide operations between Coordinator and Overlord services.
+
+#### Coordinator service
+
+[Coordinator](../design/coordinator.md) services watch over the Historical 
services on the Data servers. They are responsible for assigning segments to 
specific servers, and for ensuring segments are well-balanced across 
Historicals.
+
+#### Overlord service
+
+[Overlord](../design/overlord.md) services watch over the MiddleManager 
services on the Data servers and are the controllers of data ingestion into 
Druid. They are responsible for assigning ingestion tasks to MiddleManagers and 
for coordinating segment publishing.
+
+### Query server
+
+A Query server provides the endpoints that users and client applications 
interact with, routing queries to Data servers or other Query servers (and 
optionally proxied Master server requests).
+
+Query servers divide operations between Broker and Router services.
+
+#### Broker service
+
+[Broker](../design/broker.md) services receive queries from external clients 
and forward those queries to Data servers. When Brokers receive results from 
those subqueries, they merge those results and return them to the caller. 
Typically, you query Brokers rather than querying Historical or MiddleManager 
services on Data servers directly.
+
+#### Router service
+
+[**Router**](../design/router.md) services provide a unified API gateway in 
front of Brokers, Overlords, and Coordinators.
+
+The Router service also runs the [web console](../operations/web-console.md), 
a UI for loading data, managing datasources and tasks, and viewing server 
status and segment information.
+
+### Data server
+
+A Data server executes ingestion jobs and stores queryable data.
+
+Data servers divide operations between Historical and MiddleManager services.
+
+#### Historical service
+
+[**Historical**](../design/historical.md) services handle storage and querying 
on historical data, including any streaming data that has been in the system 
long enough to be committed. Historical services download segments from deep 
storage and respond to queries about these segments. They don't accept writes.
+
+#### MiddleManager service
+
+[**MiddleManager**](../design/middlemanager.md) services handle ingestion of 
new data into the cluster. They are responsible
+for reading from external data sources and publishing new Druid segments.
 
-* **Master**: Runs Coordinator and Overlord processes, manages data 
availability and ingestion.
-* **Query**: Runs Broker and optional Router processes, handles queries from 
external clients.
-* **Data**: Runs Historical and MiddleManager processes, executes ingestion 
workloads and stores all queryable data.
+##### Peon service
 
-For more details on process and server organization, please see [Druid 
Processes and Servers](../design/processes.md).
+[**Peon**](../design/peons.md) services are task execution engines spawned by 
MiddleManagers. Each Peon runs a separate JVM and is responsible for executing 
a single task. Peons always run on the same host as the MiddleManager that 
spawned them.
+
+#### Indexer service (optional)
+
+[**Indexer**](../design/indexer.md) services are an alternative to 
MiddleManagers and Peons. Instead of
+forking separate JVM processes per-task, the Indexer runs tasks as individual 
threads within a single JVM process.
+
+The Indexer is designed to be easier to configure and deploy compared to the 
MiddleManager + Peon system and to better enable resource sharing across tasks. 
The Indexer is a newer feature and is currently designated 
[experimental](../development/experimental.md) due to the fact that its memory 
management system is still under
+development. It will continue to mature in future versions of Druid.
+
+Typically, you would deploy either MiddleManagers or Indexers, but not both.
+
+## Colocation of services
+
+Colocating Druid services by server type generally results in better 
utilization of hardware resources for most clusters.
+For very large scale clusters, it can be desirable to split the Druid services 
such that they run on individual servers to avoid resource contention.
+
+This section describes guidelines and configuration parameters related to 
service colocation.
+
+### Coordinators and Overlords
+
+The workload on the Coordinator service tends to increase with the number of 
segments in the cluster. The Overlord's workload also increases based on the 
number of segments in the cluster, but to a lesser degree than the Coordinator.
+
+In clusters with very high segment counts, it can make sense to separate the 
Coordinator and Overlord services to provide more resources for the 
Coordinator's segment balancing workload.
+
+You can run the Coordinator and Overlord services as a single combined service 
by setting the `druid.coordinator.asOverlord.enabled` property.
+For more information, see [Coordinator 
Operation](../configuration/index.md#coordinator-operation).
+
+### Historicals and MiddleManagers
+
+With higher levels of ingestion or query load, it can make sense to deploy the 
Historical and MiddleManager services on separate hosts to to avoid CPU and 
memory contention.
+
+The Historical service also benefits from having free memory for memory mapped 
segments, which can be another reason to deploy the Historical and 
MiddleManager services separately.
 
 ## External dependencies
 
-In addition to its built-in process types, Druid also has three external 
dependencies. These are intended to be able to
+In addition to its built-in service types, Druid also has three external 
dependencies. These are intended to be able to
 leverage existing infrastructure, where present.
 
 ### Deep storage
@@ -72,18 +145,18 @@ HDFS, or a network mounted filesystem. In a single-server 
deployment, this is ty
 
 Druid uses deep storage for the following purposes:
 
-- To store all the data you ingest. Segments that get loaded onto Historical 
processes for low latency queries are also kept in deep storage for backup 
purposes. Additionally, segments that are only in deep storage can be used for 
[queries from deep storage](../querying/query-from-deep-storage.md).
-- As a way to transfer data in the background between Druid processes. Druid 
stores data in files called _segments_.
+- To store all the data you ingest. Segments that get loaded onto Historical 
services for low latency queries are also kept in deep storage for backup 
purposes. Additionally, segments that are only in deep storage can be used for 
[queries from deep storage](../querying/query-from-deep-storage.md).
+- As a way to transfer data in the background between Druid services. Druid 
stores data in files called _segments_.
 
-Historical processes cache data segments on local disk and serve queries from 
that cache as well as from an in-memory cache.
-Segments on disk for Historical processes provide the low latency querying 
performance Druid is known for.
+Historical services cache data segments on local disk and serve queries from 
that cache as well as from an in-memory cache.
+Segments on disk for Historical services provide the low latency querying 
performance Druid is known for.
 
-You can also query directly from deep storage. When you query segments that 
exist only in deep storage, you trade some performance  for the ability to 
query more of your data without necessarily having to scale your Historical 
processes.
+You can also query directly from deep storage. When you query segments that 
exist only in deep storage, you trade some performance  for the ability to 
query more of your data without necessarily having to scale your Historical 
services.
 
 When determining sizing for your storage, keep the following in mind:
 
 - Deep storage needs to be able to hold all the data that you ingest into 
Druid.
-- On disk storage for Historical processes need to be able to accommodate the 
data you want to load onto them to run queries. The data on Historical 
processes should be data you access frequently and need to run low latency 
queries for. 
+- On disk storage for Historical services need to be able to accommodate the 
data you want to load onto them to run queries. The data on Historical services 
should be data you access frequently and need to run low latency queries for. 
 
 Deep storage is an important part of Druid's elastic, fault-tolerant design. 
Druid bootstraps from deep storage even
 if every single data server is lost and re-provisioned.
@@ -104,223 +177,10 @@ Used for internal service discovery, coordination, and 
leader election.
 
 For more details, please see the [ZooKeeper](zookeeper.md) page.
 
+## Learn more
 
-## Storage design
-
-### Datasources and segments
-
-Druid data is stored in _datasources_, which are similar to tables in a 
traditional RDBMS. Each datasource is
-partitioned by time and, optionally, further partitioned by other attributes. 
Each time range is called a _chunk_ (for
-example, a single day, if your datasource is partitioned by day). Within a 
chunk, data is partitioned into one or more
-[_segments_](../design/segments.md). Each segment is a single file, typically 
comprising up to a few million rows of data. Since segments are
-organized into time chunks, it's sometimes helpful to think of segments as 
living on a timeline like the following:
-
-![Segment timeline](../assets/druid-timeline.png)
-
-A datasource may have anywhere from just a few segments, up to hundreds of 
thousands and even millions of segments. Each
-segment is created by a MiddleManager as _mutable_ and _uncommitted_. Data is 
queryable as soon as it is added to
-an uncommitted segment. The segment
-building process accelerates later queries by producing a data file that is 
compact and indexed:
-
-- Conversion to columnar format
-- Indexing with bitmap indexes
-- Compression
-    - Dictionary encoding with id storage minimization for String columns
-    - Bitmap compression for bitmap indexes
-    - Type-aware compression for all columns
-
-Periodically, segments are _committed_ and _published_ to [deep 
storage](#deep-storage),
-become immutable, and move from MiddleManagers to the Historical processes. An 
entry about the segment is also written
-to the [metadata store](#metadata-storage). This entry is a self-describing 
bit of metadata about the segment, including
-things like the schema of the segment, its size, and its location on deep 
storage. These entries tell the
-Coordinator what data is available on the cluster.
-
-For details on the segment file format, please see [segment 
files](segments.md).
-
-For details on modeling your data in Druid, see [schema 
design](../ingestion/schema-design.md).
-
-### Indexing and handoff
-
-_Indexing_ is the mechanism by which new segments are created, and _handoff_ 
is the mechanism by which they are published
-and begin being served by Historical processes. On the indexing side:
+See the following topics for more information:
 
-1. An _indexing task_ starts running and building a new segment. It must 
determine the identifier of the segment before
-it starts building it. For a task that is appending (like a Kafka task, or an 
index task in append mode) this is
-done by calling an "allocate" API on the Overlord to potentially add a new 
partition to an existing set of segments. For
-a task that is overwriting (like a Hadoop task, or an index task _not_ in 
append mode) this is done by locking an
-interval and creating a new version number and new set of segments.
-2. If the indexing task is a realtime task (like a Kafka task) then the 
segment is immediately queryable at this point.
-It's available, but unpublished.
-3. When the indexing task has finished reading data for the segment, it pushes 
it to deep storage and then publishes it
-by writing a record into the metadata store.
-4. If the indexing task is a realtime task, then to ensure data is 
continuously available for queries, it waits for a Historical process to load 
the segment. If the
-indexing task is not a realtime task, it exits immediately.
-
-On the Coordinator / Historical side:
-
-1. The Coordinator polls the metadata store periodically (by default, every 1 
minute) for newly published segments.
-2. When the Coordinator finds a segment that is published and used, but 
unavailable, it chooses a Historical process
-to load that segment and instructs that Historical to do so.
-3. The Historical loads the segment and begins serving it.
-4. At this point, if the indexing task was waiting for handoff, it will exit.
-
-### Segment identifiers
-
-Segments all have a four-part identifier with the following components:
-
-- Datasource name.
-- Time interval (for the time chunk containing the segment; this corresponds 
to the `segmentGranularity` specified
-at ingestion time).
-- Version number (generally an ISO8601 timestamp corresponding to when the 
segment set was first started).
-- Partition number (an integer, unique within a datasource+interval+version; 
may not necessarily be contiguous).
-
-For example, this is the identifier for a segment in datasource 
`clarity-cloud0`, time chunk
-`2018-05-21T16:00:00.000Z/2018-05-21T17:00:00.000Z`, version 
`2018-05-21T15:56:09.909Z`, and partition number 1:
-
-```
-clarity-cloud0_2018-05-21T16:00:00.000Z_2018-05-21T17:00:00.000Z_2018-05-21T15:56:09.909Z_1
-```
-
-Segments with partition number 0 (the first partition in a chunk) omit the 
partition number, like the following
-example, which is a segment in the same time chunk as the previous one, but 
with partition number 0 instead of 1:
-
-```
-clarity-cloud0_2018-05-21T16:00:00.000Z_2018-05-21T17:00:00.000Z_2018-05-21T15:56:09.909Z
-```
-
-### Segment versioning
-
-You may be wondering what the "version number" described in the previous 
section is for. Or, you might not be, in which
-case good for you and you can skip this section!
-
-The version number provides a form of [_multi-version concurrency control_](
-https://en.wikipedia.org/wiki/Multiversion_concurrency_control) (MVCC) to
-support batch-mode overwriting. If all you ever do is append data, then there 
will be just a
-single version for each time chunk. But when you overwrite data, Druid will 
seamlessly switch from
-querying the old version to instead query the new, updated versions. 
Specifically, a new set of
-segments is created with the same datasource, same time interval, but a higher 
version number. This is a signal to the
-rest of the Druid system that the older version should be removed from the 
cluster, and the new version should replace
-it.
-
-The switch appears to happen instantaneously to a user, because Druid handles 
this by first loading the new data (but
-not allowing it to be queried), and then, as soon as the new data is all 
loaded, switching all new queries to use those
-new segments. Then it drops the old segments a few minutes later.
-
-### Segment lifecycle
-
-Each segment has a lifecycle that involves the following three major areas:
-
-1. **Metadata store:** Segment metadata (a small JSON payload generally no 
more than a few KB) is stored in the
-[metadata store](../design/metadata-storage.md) once a segment is done being 
constructed. The act of inserting
-a record for a segment into the metadata store is called _publishing_. These 
metadata records have a boolean flag
-named `used`, which controls whether the segment is intended to be queryable 
or not. Segments created by realtime tasks will be
-available before they are published, since they are only published when the 
segment is complete and will not accept
-any additional rows of data.
-2. **Deep storage:** Segment data files are pushed to deep storage once a 
segment is done being constructed. This
-happens immediately before publishing metadata to the metadata store.
-3. **Availability for querying:** Segments are available for querying on some 
Druid data server, like a realtime task, directly from deep storage, or a 
Historical process.
-
-You can inspect the state of currently active segments using the Druid SQL
-[`sys.segments` table](../querying/sql-metadata-tables.md#segments-table). It 
includes the following flags:
-
-- `is_published`: True if segment metadata has been published to the metadata 
store and `used` is true.
-- `is_available`: True if the segment is currently available for querying, 
either on a realtime task or Historical
-process.
-- `is_realtime`: True if the segment is _only_ available on realtime tasks. 
For datasources that use realtime ingestion,
-this will generally start off `true` and then become `false` as the segment is 
published and handed off.
-- `is_overshadowed`: True if the segment is published (with `used` set to 
true) and is fully overshadowed by some other
-published segments. Generally this is a transient state, and segments in this 
state will soon have their `used` flag
-automatically set to false.
-
-### Availability and consistency
-
-Druid has an architectural separation between ingestion and querying, as 
described above in
-[Indexing and handoff](#indexing-and-handoff). This means that when 
understanding Druid's availability and
-consistency properties, we must look at each function separately.
-
-On the **ingestion side**, Druid's primary [ingestion 
methods](../ingestion/index.md#ingestion-methods) are all
-pull-based and offer transactional guarantees. This means that you are 
guaranteed that ingestion using these
-methods will publish in an all-or-nothing manner:
-
-- Supervised "seekable-stream" ingestion methods like 
[Kafka](../development/extensions-core/kafka-ingestion.md) and
-[Kinesis](../development/extensions-core/kinesis-ingestion.md). With these 
methods, Druid commits stream offsets to its
-[metadata store](#metadata-storage) alongside segment metadata, in the same 
transaction. Note that ingestion of data
-that has not yet been published can be rolled back if ingestion tasks fail. In 
this case, partially-ingested data is
-discarded, and Druid will resume ingestion from the last committed set of 
stream offsets. This ensures exactly-once
-publishing behavior.
-- [Hadoop-based batch ingestion](../ingestion/hadoop.md). Each task publishes 
all segment metadata in a single
-transaction.
-- [Native batch ingestion](../ingestion/native-batch.md). In parallel mode, 
the supervisor task publishes all segment
-metadata in a single transaction after the subtasks are finished. In simple 
(single-task) mode, the single task
-publishes all segment metadata in a single transaction after it is complete.
-
-Additionally, some ingestion methods offer an _idempotency_ guarantee. This 
means that repeated executions of the same
-ingestion will not cause duplicate data to be ingested:
-
-- Supervised "seekable-stream" ingestion methods like 
[Kafka](../development/extensions-core/kafka-ingestion.md) and
-[Kinesis](../development/extensions-core/kinesis-ingestion.md) are idempotent 
due to the fact that stream offsets and
-segment metadata are stored together and updated in lock-step.
-- [Hadoop-based batch ingestion](../ingestion/hadoop.md) is idempotent unless 
one of your input sources
-is the same Druid datasource that you are ingesting into. In this case, 
running the same task twice is non-idempotent,
-because you are adding to existing data instead of overwriting it.
-- [Native batch ingestion](../ingestion/native-batch.md) is idempotent unless
-[`appendToExisting`](../ingestion/native-batch.md) is true, or one of your 
input sources is the same Druid datasource
-that you are ingesting into. In either of these two cases, running the same 
task twice is non-idempotent, because you
-are adding to existing data instead of overwriting it.
-
-On the **query side**, the Druid Broker is responsible for ensuring that a 
consistent set of segments is involved in a
-given query. It selects the appropriate set of segment versions to use when 
the query starts based on what is currently
-available. This is supported by _atomic replacement_, a feature that ensures 
that from a user's perspective, queries
-flip instantaneously from an older version of data to a newer set of data, 
with no consistency or performance impact.
-(See [segment versioning](#segment-versioning) above.)
-This is used for Hadoop-based batch ingestion, native batch ingestion when 
`appendToExisting` is false, and compaction.
-
-Note that atomic replacement happens for each time chunk individually. If a 
batch ingestion task or compaction
-involves multiple time chunks, then each time chunk will undergo atomic 
replacement soon after the task finishes, but
-the replacements will not all happen simultaneously.
-
-Typically, atomic replacement in Druid is based on a _core set_ concept that 
works in conjunction with segment versions.
-When a time chunk is overwritten, a new core set of segments is created with a 
higher version number. The core set
-must _all_ be available before the Broker will use them instead of the older 
set. There can also only be one core set
-per version per time chunk. Druid will also only use a single version at a 
time per time chunk. Together, these
-properties provide Druid's atomic replacement guarantees.
-
-Druid also supports an experimental _segment locking_ mode that is activated 
by setting
-[`forceTimeChunkLock`](../ingestion/tasks.md#context) to false in the context 
of an ingestion task. In this case, Druid
-creates an _atomic update group_ using the existing version for the time 
chunk, instead of creating a new core set
-with a new version number. There can be multiple atomic update groups with the 
same version number per time chunk. Each
-one replaces a specific set of earlier segments in the same time chunk and 
with the same version number. Druid will
-query the latest one that is fully available. This is a more powerful version 
of the core set concept, because it
-enables atomically replacing a subset of data for a time chunk, as well as 
doing atomic replacement and appending
-simultaneously.
-
-If segments become unavailable due to multiple Historicals going offline 
simultaneously (beyond your replication
-factor), then Druid queries will include only the segments that are still 
available. In the background, Druid will
-reload these unavailable segments on other Historicals as quickly as possible, 
at which point they will be included in
-queries again.
-
-## Query processing
-
-Queries are distributed across the Druid cluster, and managed by a Broker.
-Queries first enter the [Broker](../design/broker.md), which identifies the 
segments with data that may pertain to that query.
-The list of segments is always pruned by time, and may also be pruned by other 
attributes depending on how your
-datasource is partitioned. The Broker will then identify which 
[Historicals](../design/historical.md) and
-[MiddleManagers](../design/middlemanager.md) are serving those segments and 
distributes a rewritten subquery to each of those processes.
-The Historical/MiddleManager processes execute each subquery and return 
results to the Broker. The Broker merges the partial results
-to get the final answer, which it returns to the original caller.
-
-Time and attribute pruning is an important way that Druid limits the amount of 
data that must be scanned for each query, but it is
-not the only way. For filters at a more granular level than what the Broker 
can use for pruning,
-[indexing structures](#datasources-and-segments)
-inside each segment allow Historicals to figure out which (if any) rows match 
the filter set before looking at any row of
-data. Once a Historical knows which rows match a particular query, it only 
accesses the specific rows and columns it needs for that
-query.
-
-So Druid uses three different techniques to maximize query performance:
-
-- Pruning the set of segments accessed for a query.
-- Within each segment, using indexes to identify which rows must be accessed.
-- Within each segment, only reading the specific rows and columns that are 
relevant to a particular query.
-
-For more details about how Druid executes queries, refer to the [Query 
execution](../querying/query-execution.md)
-documentation.
+* [Storage components](storage.md) to learn about data storage in Druid.
+* [Segments](segments.md) to learn about segment files.
+* [Query processing](../querying/query-processing.md) for a high-level 
overview of how Druid processes queries.
\ No newline at end of file
diff --git a/docs/design/broker.md b/docs/design/broker.md
index 107048a7ada..bbd6b94f2b0 100644
--- a/docs/design/broker.md
+++ b/docs/design/broker.md
@@ -1,6 +1,7 @@
 ---
 id: broker
-title: "Broker"
+title: "Broker service"
+sidebar_label: "Broker"
 ---
 
 <!--
@@ -23,34 +24,31 @@ title: "Broker"
   -->
 
 
-### Configuration
+The Broker service routes queries in a distributed cluster setup. It 
interprets the metadata published to ZooKeeper about segment distribution 
across services and routes queries accordingly. Additionally, the Broker 
service consolidates result sets from individual services.
 
-For Apache Druid Broker Process Configuration, see [Broker 
Configuration](../configuration/index.md#broker).
+## Configuration
 
-For basic tuning guidance for the Broker process, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#broker).
+For Apache Druid Broker service configuration, see [Broker 
Configuration](../configuration/index.md#broker).
 
-### HTTP endpoints
+For basic tuning guidance for the Broker service, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#broker).
 
-For a list of API endpoints supported by the Broker, see [Broker 
API](../api-reference/legacy-metadata-api.md#broker).
-
-### Overview
+## HTTP endpoints
 
-The Broker is the process to route queries to if you want to run a distributed 
cluster. It understands the metadata published to ZooKeeper about what segments 
exist on what processes and routes queries such that they hit the right 
processes. This process also merges the result sets from all of the individual 
processes together.
-On start up, Historical processes announce themselves and the segments they 
are serving in Zookeeper.
+For a list of API endpoints supported by the Broker, see [Broker 
API](../api-reference/legacy-metadata-api.md#broker).
 
-### Running
+## Running
 
 ```
 org.apache.druid.cli.Main server broker
 ```
 
-### Forwarding queries
+## Forwarding queries
 
-Most Druid queries contain an interval object that indicates a span of time 
for which data is requested. Likewise, Druid [Segments](../design/segments.md) 
are partitioned to contain data for some interval of time and segments are 
distributed across a cluster. Consider a simple datasource with 7 segments 
where each segment contains data for a given day of the week. Any query issued 
to the datasource for more than one day of data will hit more than one segment. 
These segments will likely b [...]
+Most Druid queries contain an interval object that indicates a span of time 
for which data is requested. Similarly, Druid partitions 
[segments](../design/segments.md) to contain data for some interval of time and 
distributes the segments across a cluster. Consider a simple datasource with 
seven segments where each segment contains data for a given day of the week. 
Any query issued to the datasource for more than one day of data will hit more 
than one segment. These segments will likely b [...]
 
-To determine which processes to forward queries to, the Broker process first 
builds a view of the world from information in Zookeeper. Zookeeper maintains 
information about [Historical](../design/historical.md) and streaming ingestion 
[Peon](../design/peons.md) processes and the segments they are serving. For 
every datasource in Zookeeper, the Broker process builds a timeline of segments 
and the processes that serve them. When queries are received for a specific 
datasource and interval,  [...]
+To determine which services to forward queries to, the Broker service first 
builds a view of the world from information in ZooKeeper. ZooKeeper maintains 
information about [Historical](../design/historical.md) and streaming ingestion 
[Peon](../design/peons.md) services and the segments they are serving. For 
every datasource in ZooKeeper, the Broker service builds a timeline of segments 
and the services that serve them. When queries are received for a specific 
datasource and interval, the [...]
 
-### Caching
+## Caching
 
-Broker processes employ a cache with an LRU cache invalidation strategy. The 
Broker cache stores per-segment results. The cache can be local to each Broker 
process or shared across multiple processes using an external distributed cache 
such as [memcached](http://memcached.org/). Each time a broker process receives 
a query, it first maps the query to a set of segments. A subset of these 
segment results may already exist in the cache and the results can be directly 
pulled from the cache. F [...]
-Historical processes. Once the Historical processes return their results, the 
Broker will store those results in the cache. Real-time segments are never 
cached and hence requests for real-time data will always be forwarded to 
real-time processes. Real-time data is perpetually changing and caching the 
results would be unreliable.
+Broker services employ a cache with an LRU cache invalidation strategy. The 
Broker cache stores per-segment results. The cache can be local to each Broker 
service or shared across multiple services using an external distributed cache 
such as [memcached](http://memcached.org/). Each time a Broker service receives 
a query, it first maps the query to a set of segments. A subset of these 
segment results may already exist in the cache and the results can be directly 
pulled from the cache. For [...]
+Historical services. Once the Historical services return their results, the 
Broker will store those results in the cache. Real-time segments are never 
cached and hence requests for real-time data will always be forwarded to 
real-time services. Real-time data is perpetually changing and caching the 
results would be unreliable.
\ No newline at end of file
diff --git a/docs/design/coordinator.md b/docs/design/coordinator.md
index 4e069238163..e3652d2c344 100644
--- a/docs/design/coordinator.md
+++ b/docs/design/coordinator.md
@@ -1,6 +1,7 @@
 ---
 id: coordinator
 title: "Coordinator service"
+sidebar_label: "Coordinator"
 ---
 
 <!--
@@ -23,67 +24,63 @@ title: "Coordinator service"
   -->
 
 
-### Configuration
-
-For Apache Druid Coordinator service configuration, see [Coordinator 
configuration](../configuration/index.md#coordinator).
-
-For basic tuning guidance for the Coordinator process, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#coordinator).
-
-### HTTP endpoints
-
-For a list of API endpoints supported by the Coordinator, see [Service status 
API reference](../api-reference/service-status-api.md#coordinator).
-
-### Overview
-
-The Druid Coordinator process is primarily responsible for segment management 
and distribution. More specifically, the
-Druid Coordinator process communicates to Historical processes to load or drop 
segments based on configurations. The
-Druid Coordinator is responsible for loading new segments, dropping outdated 
segments, ensuring that segments are
-"replicated" (that is, loaded on multiple different Historical nodes) proper 
(configured) number of times, and moving
+The Coordinator service is primarily responsible for segment management and 
distribution. More specifically, the
+Coordinator service communicates to Historical services to load or drop 
segments based on configurations. The Coordinator is responsible for loading 
new segments, dropping outdated segments, ensuring that segments are 
"replicated" (that is, loaded on multiple different Historical nodes) proper 
(configured) number of times, and moving
 ("balancing") segments between Historical nodes to keep the latter evenly 
loaded.
 
-The Druid Coordinator runs its duties periodically and the time between each 
run is a configurable parameter. On each
+The Coordinator runs its duties periodically and the time between each run is 
a configurable parameter. On each
 run, the Coordinator assesses the current state of the cluster before deciding 
on the appropriate actions to take.
-Similar to the Broker and Historical processes, the Druid Coordinator 
maintains a connection to a Zookeeper cluster for
+Similar to the Broker and Historical services, the Coordinator maintains a 
connection to a ZooKeeper cluster for
 current cluster information. The Coordinator also maintains a connection to a 
database containing information about
 "used" segments (that is, the segments that *should* be loaded in the cluster) 
and the loading rules.
 
-Before any unassigned segments are serviced by Historical processes, the 
Historical processes for each tier are first
+Before any unassigned segments are serviced by Historical services, the 
Historical services for each tier are first
 sorted in terms of capacity, with least capacity servers having the highest 
priority. Unassigned segments are always
-assigned to the processes with least capacity to maintain a level of balance 
between processes. The Coordinator does not
-directly communicate with a historical process when assigning it a new 
segment; instead the Coordinator creates some
-temporary information about the new segment under load queue path of the 
historical process. Once this request is seen,
-the historical process will load the segment and begin servicing it.
+assigned to the services with least capacity to maintain a level of balance 
between services. The Coordinator does not
+directly communicate with a Historical service when assigning it a new 
segment; instead the Coordinator creates some
+temporary information about the new segment under load queue path of the 
Historical service. Once this request is seen,
+the Historical service loads the segment and begins servicing it.
+
+## Configuration
+
+For Apache Druid Coordinator service configuration, see [Coordinator 
configuration](../configuration/index.md#coordinator).
+
+For basic tuning guidance for the Coordinator service, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#coordinator).
+
+## HTTP endpoints
+
+For a list of API endpoints supported by the Coordinator, see [Service status 
API reference](../api-reference/service-status-api.md#coordinator).
 
-### Running
+## Running
 
 ```
 org.apache.druid.cli.Main server coordinator
 ```
 
-### Rules
+## Rules
 
 Segments can be automatically loaded and dropped from the cluster based on a 
set of rules. For more information on rules, see [Rule 
Configuration](../operations/rule-configuration.md).
 
-### Cleaning up segments
+## Cleaning up segments
 
-On each run, the Druid Coordinator compares the set of used segments in the 
database with the segments served by some
-Historical nodes in the cluster. Coordinator sends requests to Historical 
nodes to unload unused segments or segments
+On each run, the Coordinator compares the set of used segments in the database 
with the segments served by some
+Historical nodes in the cluster. The Coordinator sends requests to Historical 
nodes to unload unused segments or segments
 that are removed from the database.
 
 Segments that are overshadowed (their versions are too old and their data has 
been replaced by newer segments) are
 marked as unused. During the next Coordinator's run, they will be unloaded 
from Historical nodes in the cluster.
 
-### Segment availability
+## Segment availability
 
-If a Historical process restarts or becomes unavailable for any reason, the 
Druid Coordinator will notice a process has gone missing and treat all segments 
served by that process as being dropped. Given a sufficient period of time, the 
segments may be reassigned to other Historical processes in the cluster. 
However, each segment that is dropped is not immediately forgotten. Instead, 
there is a transitional data structure that stores all dropped segments with an 
associated lifetime. The l [...]
+If a Historical service restarts or becomes unavailable for any reason, the 
Coordinator will notice a service has gone missing and treat all segments 
served by that service as being dropped. Given a sufficient period of time, the 
segments may be reassigned to other Historical services in the cluster. 
However, each segment that is dropped is not immediately forgotten. Instead, 
there is a transitional data structure that stores all dropped segments with an 
associated lifetime. The lifetime [...]
 
-### Balancing segment load
+## Balancing segment load
 
-To ensure an even distribution of segments across Historical processes in the 
cluster, the Coordinator process will find the total size of all segments being 
served by every Historical process each time the Coordinator runs. For every 
Historical process tier in the cluster, the Coordinator process will determine 
the Historical process with the highest utilization and the Historical process 
with the lowest utilization. The percent difference in utilization between the 
two processes is com [...]
+To ensure an even distribution of segments across Historical services in the 
cluster, the Coordinator service will find the total size of all segments being 
served by every Historical service each time the Coordinator runs. For every 
Historical service tier in the cluster, the Coordinator service will determine 
the Historical service with the highest utilization and the Historical service 
with the lowest utilization. The percent difference in utilization between the 
two services is compu [...]
 
-### Automatic compaction
+## Automatic compaction
 
-The Druid Coordinator manages the [automatic compaction 
system](../data-management/automatic-compaction.md).
+The Coordinator manages the [automatic compaction 
system](../data-management/automatic-compaction.md).
 Each run, the Coordinator compacts segments by merging small segments or 
splitting a large one. This is useful when the size of your segments is not 
optimized which may degrade query performance.
 See [Segment size optimization](../operations/segment-optimization.md) for 
details.
 
@@ -108,14 +105,14 @@ 
druid.coordinator.<SOME_GROUP_NAME>.duties=["compactSegments"]
 
druid.coordinator.<SOME_GROUP_NAME>.period=<PERIOD_TO_RUN_COMPACTING_SEGMENTS_DUTY>
 ```
 
-### Segment search policy in automatic compaction
+## Segment search policy in automatic compaction
 
 At every Coordinator run, this policy looks up time chunks from newest to 
oldest and checks whether the segments in those time chunks
 need compaction.
 A set of segments needs compaction if all conditions below are satisfied:
 
-1) Total size of segments in the time chunk is smaller than or equal to the 
configured `inputSegmentSizeBytes`.
-2) Segments have never been compacted yet or compaction spec has been updated 
since the last compaction: `maxTotalRows` or `indexSpec`.
+* Total size of segments in the time chunk is smaller than or equal to the 
configured `inputSegmentSizeBytes`.
+* Segments have never been compacted yet or compaction spec has been updated 
since the last compaction: `maxTotalRows` or `indexSpec`.
 
 Here are some details with an example. Suppose we have two dataSources (`foo`, 
`bar`) as seen below:
 
@@ -147,18 +144,18 @@ For more information, see [Avoid conflicts with 
ingestion](../data-management/au
  If it finds such segments, it simply skips them.
 :::
 
-### FAQ
+## FAQ
 
-1. **Do clients ever contact the Coordinator process?**
+1. **Do clients ever contact the Coordinator service?**
 
     The Coordinator is not involved in a query.
 
-    Historical processes never directly contact the Coordinator process. The 
Druid Coordinator tells the Historical processes to load/drop data via 
Zookeeper, but the Historical processes are completely unaware of the 
Coordinator.
+    Historical services never directly contact the Coordinator service. The 
Coordinator tells the Historical services to load/drop data via ZooKeeper, but 
the Historical services are completely unaware of the Coordinator.
 
-    Brokers also never contact the Coordinator. Brokers base their 
understanding of the data topology on metadata exposed by the Historical 
processes via ZK and are completely unaware of the Coordinator.
+    Brokers also never contact the Coordinator. Brokers base their 
understanding of the data topology on metadata exposed by the Historical 
services via ZooKeeper and are completely unaware of the Coordinator.
 
-2. **Does it matter if the Coordinator process starts up before or after other 
processes?**
+2. **Does it matter if the Coordinator service starts up before or after other 
services?**
 
-    No. If the Druid Coordinator is not started up, no new segments will be 
loaded in the cluster and outdated segments will not be dropped. However, the 
Coordinator process can be started up at any time, and after a configurable 
delay, will start running Coordinator tasks.
+    No. If the Coordinator is not started up, no new segments will be loaded 
in the cluster and outdated segments will not be dropped. However, the 
Coordinator service can be started up at any time, and after a configurable 
delay, will start running Coordinator tasks.
 
     This also means that if you have a working cluster and all of your 
Coordinators die, the cluster will continue to function, it just won’t 
experience any changes to its data topology.
diff --git a/docs/design/historical.md b/docs/design/historical.md
index 2fc06d08210..d4a0782ba2a 100644
--- a/docs/design/historical.md
+++ b/docs/design/historical.md
@@ -1,6 +1,7 @@
 ---
 id: historical
 title: "Historical service"
+sidebar_label: "Historical"
 ---
 
 <!--
@@ -22,53 +23,51 @@ title: "Historical service"
   ~ under the License.
   -->
 
+The Historical service is responsible for storing and querying historical data.
+Historical services cache data segments on local disk and serve queries from 
that cache as well as from an in-memory cache.
 
-### Configuration
+## Configuration
 
 For Apache Druid Historical service configuration, see [Historical 
configuration](../configuration/index.md#historical).
 
 For basic tuning guidance for the Historical service, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#historical).
 
-### HTTP endpoints
+## HTTP endpoints
 
 For a list of API endpoints supported by the Historical, please see the 
[Service status API 
reference](../api-reference/service-status-api.md#historical).
 
-### Running
+## Running
 
 ```
 org.apache.druid.cli.Main server historical
 ```
 
-### Loading and serving segments
+## Loading and serving segments
 
-Each Historical process copies or "pulls" segment files from Deep Storage to 
local disk in an area called the *segment cache*.  Set the 
`druid.segmentCache.locations` to configure the size and location of the 
segment cache on each Historical process. See [Historical general 
configuration](../configuration/index.md#historical-general-configuration).
+Each Historical service copies or pulls segment files from deep storage to 
local disk in an area called the segment cache. To configure the size and 
location of the segment cache on each Historical service, set the 
`druid.segmentCache.locations`.
+For more information, see [Segment cache 
size](../operations/basic-cluster-tuning.md#segment-cache-size).
 
-See the [Tuning 
Guide](../operations/basic-cluster-tuning.md#segment-cache-size) for more 
information.
+The [Coordinator](../design/coordinator.md) controls the assignment of 
segments to Historicals and the balance of segments between Historicals. 
Historical services do not communicate directly with each other, nor do they 
communicate directly with the Coordinator. Instead, the Coordinator creates 
ephemeral entries in ZooKeeper in a [load queue 
path](../configuration/index.md#path-configuration). Each Historical service 
maintains a connection to ZooKeeper, watching those paths for segment  [...]
 
-The [Coordinator](../design/coordinator.md) controls the assignment of 
segments to Historicals and the balance of segments between Historicals. 
Historical processes do not communicate directly with each other, nor do they 
communicate directly with the Coordinator.  Instead, the Coordinator creates 
ephemeral entries in Zookeeper in a [load queue 
path](../configuration/index.md#path-configuration). Each Historical process 
maintains a connection to Zookeeper, watching those paths for segmen [...]
+When a Historical service detects a new entry in the ZooKeeper load queue, it 
checks its own segment cache. If no information about the segment exists there, 
the Historical service first retrieves metadata from ZooKeeper about the 
segment, including where the segment is located in deep storage and how it 
needs to decompress and process it.
 
-For more information about how the Coordinator assigns segments to Historical 
processes, see [Coordinator](../design/coordinator.md).
+For more information about segment metadata and Druid segments in general, see 
[Segments](../design/segments.md).
 
-When a Historical process detects a new entry in the Zookeeper load queue, it 
checks its own segment cache. If no information about the segment exists there, 
the Historical process first retrieves metadata from Zookeeper about the 
segment, including where the segment is located in Deep Storage and how it 
needs to decompress and process it.
+After a Historical service pulls down and processes a segment from deep 
storage, Druid advertises the segment as being available for queries from the 
Broker. This announcement by the Historical is made via ZooKeeper, in a [served 
segments path](../configuration/index.md#path-configuration).
 
-For more information about segment metadata and Druid segments in general, see 
[Segments](../design/segments.md). 
-
-After a Historical process pulls down and processes a segment from Deep 
Storage, Druid advertises the segment as being available for queries from the 
Broker.  This announcement by the Historical is made via Zookeeper, in a 
[served segments path](../configuration/index.md#path-configuration).
-
-For more information about how the Broker determines what data is available 
for queries, please see [Broker](broker.md).
+For more information about how the Broker determines what data is available 
for queries, see [Broker](broker.md).
 
 To make data from the segment cache available for querying as soon as 
possible, Historical services search the local segment cache upon startup and 
advertise the segments found there.
 
-### Loading and serving segments from cache
+## Loading and serving segments from cache
 
-The segment cache uses [memory mapping](https://en.wikipedia.org/wiki/Mmap). 
The cache consumes memory from the underlying operating system so Historicals 
can hold parts of segment files in memory to increase query performance at the 
data level.  The in-memory segment cache is affected by the size of the 
Historical JVM, heap / direct memory buffers, and other processes on the 
operating system itself.
+The segment cache uses [memory mapping](https://en.wikipedia.org/wiki/Mmap). 
The cache consumes memory from the underlying operating system so Historicals 
can hold parts of segment files in memory to increase query performance at the 
data level. The in-memory segment cache is affected by the size of the 
Historical JVM, heap / direct memory buffers, and other services on the 
operating system itself.
 
-At query time, if the required part of a segment file is available in the 
memory mapped cache or "page cache", the Historical re-uses it and reads it 
directly from memory.  If it is not in the memory-mapped cache, the Historical 
reads that part of the segment from disk. In this case, there is potential for 
new data to flush other segment data from memory. This means that if free 
operating system memory is close to `druid.server.maxSize`, the more likely 
that segment data will be availabl [...]
+At query time, if the required part of a segment file is available in the 
memory mapped cache or "page cache", the Historical re-uses it and reads it 
directly from memory. If it is not in the memory-mapped cache, the Historical 
reads that part of the segment from disk. In this case, there is potential for 
new data to flush other segment data from memory. This means that if free 
operating system memory is close to `druid.server.maxSize`, the more likely 
that segment data will be available [...]
 
 Note that this memory-mapped segment cache is in addition to other 
[query-level caches](../querying/caching.md).
 
-### Querying segments
-
-Please see [Querying](../querying/querying.md) for more information on 
querying Historical processes.
+## Querying segments
 
-A Historical can be configured to log and report metrics for every query it 
services.
+You can configure a Historical service to log and report metrics for every 
query it services.
+For information on querying Historical services, see 
[Querying](../querying/querying.md).
diff --git a/docs/design/indexer.md b/docs/design/indexer.md
index 0cb7fbad910..ae9254b9cc2 100644
--- a/docs/design/indexer.md
+++ b/docs/design/indexer.md
@@ -1,6 +1,7 @@
 ---
 layout: doc_page
-title: "Indexer Process"
+title: "Indexer service"
+sidebar_label: "Indexer"
 ---
 
 <!--
@@ -27,69 +28,68 @@ title: "Indexer Process"
  Its memory management system is still under development and will be 
significantly enhanced in later releases.
 :::
 
-The Apache Druid Indexer process is an alternative to the MiddleManager + Peon 
task execution system. Instead of forking a separate JVM process per-task, the 
Indexer runs tasks as separate threads within a single JVM process.
+The Apache Druid Indexer service is an alternative to the MiddleManager + Peon 
task execution system. Instead of forking a separate JVM process per-task, the 
Indexer runs tasks as separate threads within a single JVM process.
 
 The Indexer is designed to be easier to configure and deploy compared to the 
MiddleManager + Peon system and to better enable resource sharing across tasks.
 
-### Configuration
+## Configuration
 
-For Apache Druid Indexer Process Configuration, see [Indexer 
Configuration](../configuration/index.md#indexer).
+For Apache Druid Indexer service configuration, see [Indexer 
Configuration](../configuration/index.md#indexer).
 
-### HTTP endpoints
+## HTTP endpoints
 
-The Indexer process shares the same HTTP endpoints as the 
[MiddleManager](../api-reference/service-status-api.md#middlemanager).
+The Indexer service shares the same HTTP endpoints as the 
[MiddleManager](../api-reference/service-status-api.md#middlemanager).
 
-### Running
+## Running
 
 ```
 org.apache.druid.cli.Main server indexer
 ```
 
-### Task resource sharing
+## Task resource sharing
 
-The following resources are shared across all tasks running inside an Indexer 
process.
+The following resources are shared across all tasks running inside the Indexer 
service.
 
-#### Query resources
+### Query resources
 
-The query processing threads and buffers are shared across all tasks. The 
Indexer will serve queries from a single endpoint shared by all tasks.
+The query processing threads and buffers are shared across all tasks. The 
Indexer serves queries from a single endpoint shared by all tasks.
 
 If [query caching](../configuration/index.md#indexer-caching) is enabled, the 
query cache is also shared across all tasks.
 
-#### Server HTTP threads
-
-The Indexer maintains two equally sized pools of HTTP threads. 
+### Server HTTP threads
 
+The Indexer maintains two equally sized pools of HTTP threads.
 One pool is exclusively used for task control messages between the Overlord 
and the Indexer ("chat handler threads"). The other pool is used for handling 
all other HTTP requests.
 
-The size of the pools are configured by the `druid.server.http.numThreads` 
configuration (e.g., if this is set to 10, there will be 10 chat handler 
threads and 10 non-chat handler threads).
+To configure the number of threads, use the `druid.server.http.numThreads` 
property. For example, if `druid.server.http.numThreads` is set to 10, there 
will be 10 chat handler threads and 10 non-chat handler threads.
 
-In addition to these two pools, 2 separate threads are allocated for lookup 
handling. If lookups are not used, these threads will not be used.
+In addition to these two pools, the Indexer allocates two separate threads for 
lookup handling. If lookups are not used, these threads will not be used.
 
-#### Memory sharing
+### Memory sharing
 
-The Indexer uses the `druid.worker.globalIngestionHeapLimitBytes` 
configuration to impose a global heap limit across all of the tasks it is 
running. 
+The Indexer uses the `druid.worker.globalIngestionHeapLimitBytes` property to 
impose a global heap limit across all of the tasks it is running.
 
 This global limit is evenly divided across the number of task slots configured 
by `druid.worker.capacity`.
 
-To apply the per-task heap limit, the Indexer will override `maxBytesInMemory` 
in task tuning configs (i.e., ignoring the default value or any user configured 
value). `maxRowsInMemory` will also be overridden to an essentially unlimited 
value: the Indexer does not support row limits.
+To apply the per-task heap limit, the Indexer overrides `maxBytesInMemory` in 
task tuning configurations, that is ignoring the default value or any user 
configured value. It also overrides `maxRowsInMemory` to an essentially 
unlimited value: the Indexer does not support row limits.
 
-By default, `druid.worker.globalIngestionHeapLimitBytes` is set to 1/6th of 
the available JVM heap. This default is chosen to align with the default value 
of `maxBytesInMemory` in task tuning configs when using the MiddleManager/Peon 
system, which is also 1/6th of the JVM heap.
+By default, `druid.worker.globalIngestionHeapLimitBytes` is set to 1/6th of 
the available JVM heap. This default is chosen to align with the default value 
of `maxBytesInMemory` in task tuning configs when using the MiddleManager + 
Peon system, which is also 1/6th of the JVM heap.
 
 The peak usage for rows held in heap memory relates to the interaction between 
the `maxBytesInMemory` and `maxPendingPersists` properties in the task tuning 
configs. When the amount of row data held in-heap by a task reaches the limit 
specified by `maxBytesInMemory`, a task will persist the in-heap row data. 
After the persist has been started, the task can again ingest up to 
`maxBytesInMemory` bytes worth of row data while the persist is running.
 
-This means that the peak in-heap usage for row data can be up to approximately 
`maxBytesInMemory` * (2 + `maxPendingPersists`). The default value of 
`maxPendingPersists` is 0, which allows for 1 persist to run concurrently with 
ingestion work.
+This means that the peak in-heap usage for row data can be up to approximately 
`maxBytesInMemory * (2 + maxPendingPersists)`. The default value of 
`maxPendingPersists` is 0, which allows for 1 persist to run concurrently with 
ingestion work.
 
 The remaining portion of the heap is reserved for query processing and segment 
persist/merge operations, and miscellaneous heap usage.
 
-#### Concurrent segment persist/merge limits
+### Concurrent segment persist/merge limits
 
 To help reduce peak memory usage, the Indexer imposes a limit on the number of 
concurrent segment persist/merge operations across all running tasks.
 
-By default, the number of concurrent persist/merge operations is limited to 
(`druid.worker.capacity` / 2), rounded down. This limit can be configured with 
the `druid.worker.numConcurrentMerges` property.
+By default, the number of concurrent persist/merge operations is limited to 
`(druid.worker.capacity / 2)`, rounded down. This limit can be configured with 
the `druid.worker.numConcurrentMerges` property.
 
-### Current limitations
+## Current limitations
 
-Separate task logs are not currently supported when using the Indexer; all 
task log messages will instead be logged in the Indexer process log.
+Separate task logs are not currently supported when using the Indexer; all 
task log messages will instead be logged in the Indexer service log.
 
 The Indexer currently imposes an identical memory limit on each task. In later 
releases, the per-task memory limit will be removed and only the global limit 
will apply. The limit on concurrent merges will also be removed.
 
diff --git a/docs/design/middlemanager.md b/docs/design/middlemanager.md
index a5929ec3052..28738295a05 100644
--- a/docs/design/middlemanager.md
+++ b/docs/design/middlemanager.md
@@ -1,6 +1,7 @@
 ---
 id: middlemanager
 title: "MiddleManager service"
+sidebar_label: "MiddleManager"
 ---
 
 <!--
@@ -22,23 +23,20 @@ title: "MiddleManager service"
   ~ under the License.
   -->
 
+The MiddleManager service is a worker service that executes submitted tasks. 
MiddleManagers forward tasks to [Peons](../design/peons.md) that run in 
separate JVMs.
+Druid uses separate JVMs for tasks to isolate resources and logs. Each Peon is 
capable of running only one task at a time, wheres a MiddleManager may have 
multiple Peons.
 
-### Configuration
+## Configuration
 
 For Apache Druid MiddleManager service configuration, see [MiddleManager and 
Peons](../configuration/index.md#middlemanager-and-peons).
 
 For basic tuning guidance for the MiddleManager service, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#middlemanager).
 
-### HTTP endpoints
+## HTTP endpoints
 
-For a list of API endpoints supported by the MiddleManager, please see the 
[Service status API 
reference](../api-reference/service-status-api.md#middlemanager).
+For a list of API endpoints supported by the MiddleManager, see the [Service 
status API reference](../api-reference/service-status-api.md#middlemanager).
 
-### Overview
-
-The MiddleManager process is a worker process that executes submitted tasks. 
Middle Managers forward tasks to Peons that run in separate JVMs.
-The reason we have separate JVMs for tasks is for resource and log isolation. 
Each [Peon](../design/peons.md) is capable of running only one task at a time, 
however, a MiddleManager may have multiple Peons.
-
-### Running
+## Running
 
 ```
 org.apache.druid.cli.Main server middleManager
diff --git a/docs/design/overlord.md b/docs/design/overlord.md
index 17580a3fafa..83be16db789 100644
--- a/docs/design/overlord.md
+++ b/docs/design/overlord.md
@@ -1,6 +1,7 @@
 ---
 id: overlord
 title: "Overlord service"
+sidebar_label: "Overlord"
 ---
 
 <!--
@@ -23,24 +24,22 @@ title: "Overlord service"
   -->
 
 
-### Configuration
-
-For Apache Druid Overlord Process Configuration, see [Overlord 
Configuration](../configuration/index.md#overlord).
+The Overlord service is responsible for accepting tasks, coordinating task 
distribution, creating locks around tasks, and returning statuses to callers. 
The Overlord can be configured to run in one of two modes - local or remote 
(local being default).
+In local mode, the Overlord is also responsible for creating Peons for 
executing tasks. When running the Overlord in local mode, all MiddleManager and 
Peon configurations must be provided as well.
+Local mode is typically used for simple workflows. In remote mode, the 
Overlord and MiddleManager are run in separate services and you can run each on 
a different server.
+This mode is recommended if you intend to use the indexing service as the 
single endpoint for all Druid indexing.
 
-For basic tuning guidance for the Overlord process, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#overlord).
+## Configuration
 
-### HTTP endpoints
+For Apache Druid Overlord service configuration, see [Overlord 
Configuration](../configuration/index.md#overlord).
 
-For a list of API endpoints supported by the Overlord, please see the [Service 
status API reference](../api-reference/service-status-api.md#overlord).
+For basic tuning guidance for the Overlord service, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#overlord).
 
-### Overview
+## HTTP endpoints
 
-The Overlord process is responsible for accepting tasks, coordinating task 
distribution, creating locks around tasks, and returning statuses to callers. 
Overlord can be configured to run in one of two modes - local or remote (local 
being default).
-In local mode Overlord is also responsible for creating Peons for executing 
tasks. When running the Overlord in local mode, all MiddleManager and Peon 
configurations must be provided as well.
-Local mode is typically used for simple workflows.  In remote mode, the 
Overlord and MiddleManager are run in separate processes and you can run each 
on a different server.
-This mode is recommended if you intend to use the indexing service as the 
single endpoint for all Druid indexing.
+For a list of API endpoints supported by the Overlord, please see the [Service 
status API reference](../api-reference/service-status-api.md#overlord).
 
-### Blacklisted workers
+## Blacklisted workers
 
 If a MiddleManager has task failures above a threshold, the Overlord will 
blacklist these MiddleManagers. No more than 20% of the MiddleManagers can be 
blacklisted. Blacklisted MiddleManagers will be periodically whitelisted.
 
@@ -53,8 +52,8 @@ druid.indexer.runner.workerBlackListCleanupPeriod
 druid.indexer.runner.maxPercentageBlacklistWorkers
 ```
 
-### Autoscaling
+## Autoscaling
 
-The Autoscaling mechanisms currently in place are tightly coupled with our 
deployment infrastructure but the framework should be in place for other 
implementations. We are highly open to new implementations or extensions of the 
existing mechanisms. In our own deployments, MiddleManager processes are Amazon 
AWS EC2 nodes and they are provisioned to register themselves in a 
[galaxy](https://github.com/ning/galaxy) environment.
+The autoscaling mechanisms currently in place are tightly coupled with our 
deployment infrastructure but the framework should be in place for other 
implementations. We are highly open to new implementations or extensions of the 
existing mechanisms. In our own deployments, MiddleManager services are Amazon 
AWS EC2 nodes and they are provisioned to register themselves in a 
[galaxy](https://github.com/ning/galaxy) environment.
 
 If autoscaling is enabled, new MiddleManagers may be added when a task has 
been in pending state for too long. MiddleManagers may be terminated if they 
have not run any tasks for a period of time.
diff --git a/docs/design/peons.md b/docs/design/peons.md
index e1348a25763..8c2a73a069a 100644
--- a/docs/design/peons.md
+++ b/docs/design/peons.md
@@ -1,6 +1,7 @@
 ---
 id: peons
-title: "Peons"
+title: "Peon service"
+sidebar_label: "Peon"
 ---
 
 <!--
@@ -22,21 +23,22 @@ title: "Peons"
   ~ under the License.
   -->
 
+The Peon service is a task execution engine spawned by the MiddleManager. Each 
Peon runs a separate JVM and is responsible for executing a single task. Peons 
always run on the same host as the MiddleManager that spawned them.
 
-### Configuration
+## Configuration
 
-For Apache Druid Peon Configuration, see [Peon Query 
Configuration](../configuration/index.md#peon-query-configuration) and 
[Additional Peon 
Configuration](../configuration/index.md#additional-peon-configuration).
+For Apache Druid Peon configuration, see [Peon Query 
Configuration](../configuration/index.md#peon-query-configuration) and 
[Additional Peon 
Configuration](../configuration/index.md#additional-peon-configuration).
 
 For basic tuning guidance for MiddleManager tasks, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#task-configurations).
 
-### HTTP endpoints
+## HTTP endpoints
 
-Peons run a single task in a single JVM. MiddleManager is responsible for 
creating Peons for running tasks.
-Peons should rarely (if ever for testing purposes) be run on their own.
+Peons run a single task in a single JVM. The MiddleManager is responsible for 
creating Peons for running tasks.
+Peons should rarely run on their own.
 
-### Running
+## Running
 
-The Peon should very rarely ever be run independent of the MiddleManager 
unless for development purposes.
+The Peon should seldom run separately from the MiddleManager, except for 
development purposes.
 
 ```
 org.apache.druid.cli.Main internal peon <task_file> <status_file>
diff --git a/docs/design/processes.md b/docs/design/processes.md
deleted file mode 100644
index c802f27b28d..00000000000
--- a/docs/design/processes.md
+++ /dev/null
@@ -1,143 +0,0 @@
----
-id: processes
-title: "Processes and servers"
----
-
-<!--
-  ~ Licensed to the Apache Software Foundation (ASF) under one
-  ~ or more contributor license agreements.  See the NOTICE file
-  ~ distributed with this work for additional information
-  ~ regarding copyright ownership.  The ASF licenses this file
-  ~ to you under the Apache License, Version 2.0 (the
-  ~ "License"); you may not use this file except in compliance
-  ~ with the License.  You may obtain a copy of the License at
-  ~
-  ~   http://www.apache.org/licenses/LICENSE-2.0
-  ~
-  ~ Unless required by applicable law or agreed to in writing,
-  ~ software distributed under the License is distributed on an
-  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-  ~ KIND, either express or implied.  See the License for the
-  ~ specific language governing permissions and limitations
-  ~ under the License.
-  -->
-
-
-## Process types
-
-Druid has several process types:
-
-* [Coordinator](../design/coordinator.md)
-* [Overlord](../design/overlord.md)
-* [Broker](../design/broker.md)
-* [Historical](../design/historical.md)
-* [MiddleManager](../design/middlemanager.md) and [Peons](../design/peons.md)
-* [Indexer (Optional)](../design/indexer.md)
-* [Router (Optional)](../design/router.md)
-
-## Server types
-
-Druid processes can be deployed any way you like, but for ease of deployment 
we suggest organizing them into three server types:
-
-* **Master**
-* **Query**
-* **Data**
-
-![Druid architecture](../assets/druid-architecture.png)
-
-This section describes the Druid processes and the suggested Master/Query/Data 
server organization, as shown in the architecture diagram above.
-
-### Master server
-
-A Master server manages data ingestion and availability: it is responsible for 
starting new ingestion jobs and coordinating availability of data on the "Data 
servers" described below.
-
-Within a Master server, functionality is split between two processes, the 
Coordinator and Overlord.
-
-#### Coordinator process
-
-[**Coordinator**](../design/coordinator.md) processes watch over the 
Historical processes on the Data servers. They are responsible for assigning 
segments to specific servers, and for ensuring segments are well-balanced 
across Historicals.
-
-#### Overlord process
-
-[**Overlord**](../design/overlord.md) processes watch over the MiddleManager 
processes on the Data servers and are the controllers of data ingestion into 
Druid. They are responsible for assigning ingestion tasks to MiddleManagers and 
for coordinating segment publishing.
-
-### Query server
-
-A Query server provides the endpoints that users and client applications 
interact with, routing queries to Data servers or other Query servers (and 
optionally proxied Master server requests as well).
-
-Within a Query server, functionality is split between two processes, the 
Broker and Router.
-
-#### Broker process
-
-[**Broker**](../design/broker.md) processes receive queries from external 
clients and forward those queries to Data servers. When Brokers receive results 
from those subqueries, they merge those results and return them to the
-caller. End users typically query Brokers rather than querying Historicals or 
MiddleManagers processes on Data servers directly.
-
-#### Router process (optional)
-
-[**Router**](../design/router.md) processes are _optional_ processes that 
provide a unified API gateway in front of Druid Brokers,
-Overlords, and Coordinators. They are optional since you can also simply 
contact the Druid Brokers, Overlords, and
-Coordinators directly.
-
-The Router also runs the [web console](../operations/web-console.md), a 
management UI for datasources, segments, tasks, data processes (Historicals and 
MiddleManagers), and coordinator dynamic configuration. The user can also run 
SQL and native Druid queries within the console.
-
-### Data server
-
-A Data server executes ingestion jobs and stores queryable data.
-
-Within a Data server, functionality is split between two processes, the 
Historical and MiddleManager.
-
-### Historical process
-
-[**Historical**](../design/historical.md) processes are the workhorses that 
handle storage and querying on "historical" data
-(including any streaming data that has been in the system long enough to be 
committed). Historical processes
-download segments from deep storage and respond to queries about these 
segments. They don't accept writes.
-
-### Middle Manager process
-
-[**MiddleManager**](../design/middlemanager.md) processes handle ingestion of 
new data into the cluster. They are responsible
-for reading from external data sources and publishing new Druid segments.
-
-#### Peon processes
-
-[**Peon**](../design/peons.md) processes are task execution engines spawned by 
MiddleManagers. Each Peon runs a separate JVM and is responsible for executing 
a single task. Peons always run on the same host as the MiddleManager that 
spawned them.
-
-### Indexer process (optional)
-
-[**Indexer**](../design/indexer.md) processes are an alternative to 
MiddleManagers and Peons. Instead of
-forking separate JVM processes per-task, the Indexer runs tasks as individual 
threads within a single JVM process.
-
-The Indexer is designed to be easier to configure and deploy compared to the 
MiddleManager + Peon system and to
-better enable resource sharing across tasks. The Indexer is a newer feature 
and is currently designated
-[experimental](../development/experimental.md) due to the fact that its memory 
management system is still under
-development. It will continue to mature in future versions of Druid.
-
-Typically, you would deploy either MiddleManagers or Indexers, but not both.
-
-## Pros and cons of colocation
-
-Druid processes can be colocated based on the Master/Data/Query server 
organization as
-described above. This organization generally results in better utilization of
-hardware resources for most clusters.
-
-For very large scale clusters, however, it can be desirable to split the Druid 
processes
-such that they run on individual servers to avoid resource contention.
-
-This section describes guidelines and configuration parameters related to 
process colocation.
-
-### Coordinators and Overlords
-
-The workload on the Coordinator process tends to increase with the number of 
segments in the cluster. The Overlord's workload also increases based on the 
number of segments in the cluster, but to a lesser degree than the Coordinator.
-
-In clusters with very high segment counts, it can make sense to separate the 
Coordinator and Overlord processes to provide more resources for the 
Coordinator's segment balancing workload.
-
-#### Unified Process
-
-The Coordinator and Overlord processes can be run as a single combined process 
by setting the `druid.coordinator.asOverlord.enabled` property.
-
-Please see [Coordinator Configuration: 
Operation](../configuration/index.md#coordinator-operation) for details.
-
-### Historicals and MiddleManagers
-
-With higher levels of ingestion or query load, it can make sense to deploy the 
Historical and MiddleManager processes on separate hosts to to avoid CPU and 
memory contention.
-
-The Historical also benefits from having free memory for memory mapped 
segments, which can be another reason to deploy the Historical and 
MiddleManager processes separately.
diff --git a/docs/design/router.md b/docs/design/router.md
index 4c2b19fb8e6..ffe9358e488 100644
--- a/docs/design/router.md
+++ b/docs/design/router.md
@@ -1,6 +1,7 @@
 ---
 id: router
-title: "Router Process"
+title: "Router service"
+sidebar_label: "Router"
 ---
 
 <!--
@@ -22,44 +23,43 @@ title: "Router Process"
   ~ under the License.
   -->
 
-The Apache Druid Router process can be used to route queries to different 
Broker processes. By default, the broker routes queries based on how 
[Rules](../operations/rule-configuration.md) are set up. For example, if 1 
month of recent data is loaded into a `hot` cluster, queries that fall within 
the recent month can be routed to a dedicated set of brokers. Queries outside 
this range are routed to another set of brokers. This set up provides query 
isolation such that queries for more impor [...]
+The Router service distributes queries between different Broker services. By 
default, the Broker routes queries based on preconfigured [data retention 
rules](../operations/rule-configuration.md). For example, if one month of 
recent data is loaded into a `hot` cluster, queries that fall within the recent 
month can be routed to a dedicated set of Brokers. Queries outside this range 
are routed to another set of Brokers. This set up provides query isolation such 
that queries for more importa [...]
 
-For query routing purposes, you should only ever need the Router process if 
you have a Druid cluster well into the terabyte range.
+For query routing purposes, you should only ever need the Router service if 
you have a Druid cluster well into the terabyte range.
 
-In addition to query routing, the Router also runs the [web 
console](../operations/web-console.md), a management UI for datasources, 
segments, tasks, data processes (Historicals and MiddleManagers), and 
coordinator dynamic configuration. The user can also run SQL and native Druid 
queries within the console.
+In addition to query routing, the Router also runs the [web 
console](../operations/web-console.md), a UI for loading data, managing 
datasources and tasks, and viewing server status and segment information.
 
-### Configuration
+## Configuration
 
-For Apache Druid Router Process Configuration, see [Router 
Configuration](../configuration/index.md#router).
+For Apache Druid Router service configuration, see [Router 
configuration](../configuration/index.md#router).
 
-For basic tuning guidance for the Router process, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#router).
+For basic tuning guidance for the Router service, see [Basic cluster 
tuning](../operations/basic-cluster-tuning.md#router).
 
-### HTTP endpoints
+## HTTP endpoints
 
 For a list of API endpoints supported by the Router, see [Legacy metadata API 
reference](../api-reference/legacy-metadata-api.md#datasource-information).
 
-### Running
+## Running
 
 ```
 org.apache.druid.cli.Main server router
 ```
 
-### Router as management proxy
+## Router as management proxy
 
-The Router can be configured to forward requests to the active Coordinator or 
Overlord process. This may be useful for
-setting up a highly available cluster in situations where the HTTP redirect 
mechanism of the inactive -> active
-Coordinator/Overlord does not function correctly (servers are behind a load 
balancer, the hostname used in the redirect
-is only resolvable internally, etc.).
+You can configure the Router to forward requests to the active Coordinator or 
Overlord service. This may be useful for
+setting up a highly available cluster in situations where the HTTP redirect 
mechanism of the inactive to active
+Coordinator or Overlord service does not function correctly, such as when 
servers are behind a load balancer or the hostname used in the redirect is only 
resolvable internally.
 
-#### Enabling the management proxy
+### Enable the management proxy
 
-To enable this functionality, set the following in the Router's 
runtime.properties:
+To enable the management proxy, set the following in the Router's 
`runtime.properties`:
 
 ```
 druid.router.managementProxy.enabled=true
 ```
 
-#### Management proxy routing
+### Management proxy routing
 
 The management proxy supports implicit and explicit routes. Implicit routes 
are those where the destination can be
 determined from the original request path based on Druid API path conventions. 
For the Coordinator the convention is
@@ -67,10 +67,10 @@ determined from the original request path based on Druid 
API path conventions. F
 that using the management proxy does not require modifying the API request 
other than issuing the request to the Router
 instead of the Coordinator or Overlord. Most Druid API requests can be routed 
implicitly.
 
-Explicit routes are those where the request to the Router contains a path 
prefix indicating which process the request
+Explicit routes are those where the request to the Router contains a path 
prefix indicating which service the request
 should be routed to. For the Coordinator this prefix is `/proxy/coordinator` 
and for the Overlord it is `/proxy/overlord`.
 This is required for API calls with an ambiguous destination. For example, the 
`/status` API is present on all Druid
-processes, so explicit routing needs to be used to indicate the proxy 
destination.
+services, so explicit routing needs to be used to indicate the proxy 
destination.
 
 This is summarized in the table below:
 
@@ -81,11 +81,11 @@ This is summarized in the table below:
 
|`/proxy/coordinator/*`|Coordinator|`/*`|`router:8888/proxy/coordinator/status` 
-> `coordinator:8081/status`|
 
|`/proxy/overlord/*`|Overlord|`/*`|`router:8888/proxy/overlord/druid/indexer/v1/isLeader`
 -> `overlord:8090/druid/indexer/v1/isLeader`|
 
-### Router strategies
+## Router strategies
 
-The Router has a configurable list of strategies for how it selects which 
Brokers to route queries to. The order of the strategies matter because as soon 
as a strategy condition is matched, a Broker is selected.
+The Router has a configurable list of strategies to determine which Brokers to 
route queries to. The order of the strategies is important because the Broker 
is selected immediately after the strategy condition is satisfied.
 
-#### timeBoundary
+### timeBoundary
 
 ```json
 {
@@ -93,9 +93,9 @@ The Router has a configurable list of strategies for how it 
selects which Broker
 }
 ```
 
-Including this strategy means all timeBoundary queries are always routed to 
the highest priority Broker.
+Including this strategy means all `timeBoundary` queries are always routed to 
the highest priority Broker.
 
-#### priority
+### priority
 
 ```json
 {
@@ -105,14 +105,14 @@ Including this strategy means all timeBoundary queries 
are always routed to the
 }
 ```
 
-Queries with a priority set to less than minPriority are routed to the lowest 
priority Broker. Queries with priority set to greater than maxPriority are 
routed to the highest priority Broker. By default, minPriority is 0 and 
maxPriority is 1. Using these default values, if a query with priority 0 (the 
default query priority is 0) is sent, the query skips the priority selection 
logic.
+Queries with a priority set to less than `minPriority` are routed to the 
lowest priority Broker. Queries with priority set to greater than `maxPriority` 
are routed to the highest priority Broker. By default, `minPriority` is 0 and 
`maxPriority` is 1. Using these default values, if a query with priority 0 (the 
default query priority is 0) is sent, the query skips the priority selection 
logic.
 
-#### manual
+### manual
 
-This strategy reads the parameter `brokerService` from the query context and 
routes the query to that broker service. If no valid `brokerService` is 
specified in the query context, the field `defaultManualBrokerService` is used 
to determine target broker service given the value is valid and non-null. A 
value is considered valid if it is present in `druid.router.tierToBrokerMap`
-This strategy can route both Native and SQL queries (when enabled).
+This strategy reads the parameter `brokerService` from the query context and 
routes the query to that broker service. If no valid `brokerService` is 
specified in the query context, the field `defaultManualBrokerService` is used 
to determine target broker service given the value is valid and non-null. A 
value is considered valid if it is present in `druid.router.tierToBrokerMap`.
+This strategy can route both native and SQL queries.
 
-*Example*: A strategy that routes queries to the Broker "druid:broker-hot" if 
no valid `brokerService` is found in the query context.
+The following example strategy routes queries to the Broker `druid:broker-hot` 
if no valid `brokerService` is found in the query context.
 
 ```json
 {
@@ -121,11 +121,11 @@ This strategy can route both Native and SQL queries (when 
enabled).
 }
 ```
 
-#### JavaScript
+### JavaScript
 
-Allows defining arbitrary routing rules using a JavaScript function. The 
function is passed the configuration and the query to be executed, and returns 
the tier it should be routed to, or null for the default tier.
+Allows defining arbitrary routing rules using a JavaScript function. The 
function takes the configuration and the query to be executed, and returns the 
tier it should be routed to, or null for the default tier.
 
-*Example*: a function that sends queries containing more than three 
aggregators to the lowest priority Broker.
+The following example function sends queries containing more than three 
aggregators to the lowest priority Broker.
 
 ```json
 {
@@ -138,12 +138,12 @@ Allows defining arbitrary routing rules using a 
JavaScript function. The functio
  JavaScript-based functionality is disabled by default. Please refer to the 
Druid [JavaScript programming guide](../development/javascript.md) for 
guidelines about using Druid's JavaScript functionality, including instructions 
on how to enable it.
 :::
 
-### Routing of SQL queries using strategies
+## Routing of SQL queries using strategies
 
-To enable routing of SQL queries using strategies, set 
`druid.router.sql.enable` to `true`. The broker service for a
+To enable routing of SQL queries using strategies, set 
`druid.router.sql.enable` to `true`. The Broker service for a
 given SQL query is resolved using only the provided Router strategies. If not 
resolved using any of the strategies, the
 Router uses the `defaultBrokerServiceName`. This behavior is slightly 
different from native queries where the Router
-first tries to resolve the broker service using strategies, then load rules 
and finally using the `defaultBrokerServiceName`
+first tries to resolve the Broker service using strategies, then load rules 
and finally using the `defaultBrokerServiceName`
 if still not resolved. When `druid.router.sql.enable` is set to `false` 
(default value), the Router uses the
 `defaultBrokerServiceName`.
 
@@ -151,7 +151,7 @@ Setting `druid.router.sql.enable` does not affect either 
Avatica JDBC requests o
 Druid always routes native queries using the strategies and load rules as 
documented.
 Druid always routes Avatica JDBC requests based on connection ID.
 
-### Avatica query balancing
+## Avatica query balancing
 
 All Avatica JDBC requests with a given connection ID must be routed to the 
same Broker, since Druid Brokers do not share connection state with each other.
 
@@ -159,7 +159,7 @@ To accomplish this, Druid provides two built-in balancers 
that use rendezvous ha
 
 Note that when multiple Routers are used, all Routers should have identical 
balancer configuration to ensure that they make the same routing decisions.
 
-#### Rendezvous hash balancer
+### Rendezvous hash balancer
 
 This balancer uses [Rendezvous 
Hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing) on an Avatica 
request's connection ID to assign the request to a Broker.
 
@@ -169,9 +169,9 @@ To use this balancer, specify the following property:
 druid.router.avatica.balancer.type=rendezvousHash
 ```
 
-If no `druid.router.avatica.balancer` property is set, the Router will also 
default to using the Rendezvous Hash Balancer.
+If no `druid.router.avatica.balancer` property is set, the Router defaults to 
using the rendezvous hash balancer.
 
-#### Consistent hash balancer
+### Consistent hash balancer
 
 This balancer uses [Consistent 
Hashing](https://en.wikipedia.org/wiki/Consistent_hashing) on an Avatica 
request's connection ID to assign the request to a Broker.
 
@@ -183,8 +183,7 @@ druid.router.avatica.balancer.type=consistentHash
 
 This is a non-default implementation that is provided for experimentation 
purposes. The consistent hasher has longer setup times on initialization and 
when the set of Brokers changes, but has a faster Broker assignment time than 
the rendezvous hasher when tested with 5 Brokers. Benchmarks for both 
implementations have been provided in `ConsistentHasherBenchmark` and 
`RendezvousHasherBenchmark`. The consistent hasher also requires locking, while 
the rendezvous hasher does not.
 
-
-### Example production configuration
+## Example production configuration
 
 In this example, we have two tiers in our production cluster: `hot` and 
`_default_tier`. Queries for the `hot` tier are routed through the `broker-hot` 
set of Brokers, and queries for the `_default_tier` are routed through the 
`broker-cold` set of Brokers. If any exceptions or network problems occur, 
queries are routed to the `broker-cold` set of brokers. In our example, we are 
running with a c3.2xlarge EC2 instance. We assume a `common.runtime.properties` 
already exists.
 
diff --git a/docs/design/storage.md b/docs/design/storage.md
new file mode 100644
index 00000000000..da0df61f545
--- /dev/null
+++ b/docs/design/storage.md
@@ -0,0 +1,140 @@
+---
+id: storage
+title: "Storage overview"
+sidebar_label: "Storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Druid stores data in datasources, which are similar to tables in a traditional 
RDBMS. Each datasource is partitioned by time and, optionally, further 
partitioned by other attributes. Each time range is called a chunk (for 
example, a single day, if your datasource is partitioned by day). Within a 
chunk, data is partitioned into one or more [segments](../design/segments.md). 
Each segment is a single file, typically comprising up to a few million rows of 
data. Since segments are organized i [...]
+
+![Segment timeline](../assets/druid-timeline.png)
+
+A datasource may have anywhere from just a few segments, up to hundreds of 
thousands and even millions of segments. Each segment is created by a 
MiddleManager as mutable and uncommitted. Data is queryable as soon as it is 
added to an uncommitted segment. The segment building process accelerates later 
queries by producing a data file that is compact and indexed:
+
+- Conversion to columnar format
+- Indexing with bitmap indexes
+- Compression
+    - Dictionary encoding with id storage minimization for String columns
+    - Bitmap compression for bitmap indexes
+    - Type-aware compression for all columns
+
+Periodically, segments are committed and published to [deep 
storage](#deep-storage), become immutable, and move from MiddleManagers to the 
Historical services. An entry about the segment is also written to the 
[metadata store](#metadata-storage). This entry is a self-describing bit of 
metadata about the segment, including things like the schema of the segment, 
its size, and its location on deep storage. These entries tell the Coordinator 
what data is available on the cluster.
+
+For details on the segment file format, see [segment files](segments.md).
+
+For details on modeling your data in Druid, see [schema 
design](../ingestion/schema-design.md).
+
+## Indexing and handoff
+
+Indexing is the mechanism by which new segments are created, and handoff is 
the mechanism by which they are published and served by Historical services. 
+
+On the indexing side:
+
+1. An indexing task starts running and building a new segment. It must 
determine the identifier of the segment before it starts building it. For a 
task that is appending (like a Kafka task, or an index task in append mode) 
this is done by calling an "allocate" API on the Overlord to potentially add a 
new partition to an existing set of segments. For
+a task that is overwriting (like a Hadoop task, or an index task not in append 
mode) this is done by locking an interval and creating a new version number and 
new set of segments.
+2. If the indexing task is a realtime task (like a Kafka task) then the 
segment is immediately queryable at this point. It's available, but unpublished.
+3. When the indexing task has finished reading data for the segment, it pushes 
it to deep storage and then publishes it by writing a record into the metadata 
store.
+4. If the indexing task is a realtime task, then to ensure data is 
continuously available for queries, it waits for a Historical service to load 
the segment. If the indexing task is not a realtime task, it exits immediately.
+
+On the Coordinator / Historical side:
+
+1. The Coordinator polls the metadata store periodically (by default, every 1 
minute) for newly published segments.
+2. When the Coordinator finds a segment that is published and used, but 
unavailable, it chooses a Historical service to load that segment and instructs 
that Historical to do so.
+3. The Historical loads the segment and begins serving it.
+4. At this point, if the indexing task was waiting for handoff, it will exit.
+
+## Segment identifiers
+
+Segments all have a four-part identifier with the following components:
+
+- Datasource name.
+- Time interval (for the time chunk containing the segment; this corresponds 
to the `segmentGranularity` specified at ingestion time).
+- Version number (generally an ISO8601 timestamp corresponding to when the 
segment set was first started).
+- Partition number (an integer, unique within a datasource+interval+version; 
may not necessarily be contiguous).
+
+For example, this is the identifier for a segment in datasource 
`clarity-cloud0`, time chunk
+`2018-05-21T16:00:00.000Z/2018-05-21T17:00:00.000Z`, version 
`2018-05-21T15:56:09.909Z`, and partition number 1:
+
+```
+clarity-cloud0_2018-05-21T16:00:00.000Z_2018-05-21T17:00:00.000Z_2018-05-21T15:56:09.909Z_1
+```
+
+Segments with partition number 0 (the first partition in a chunk) omit the 
partition number, like the following example, which is a segment in the same 
time chunk as the previous one, but with partition number 0 instead of 1:
+
+```
+clarity-cloud0_2018-05-21T16:00:00.000Z_2018-05-21T17:00:00.000Z_2018-05-21T15:56:09.909Z
+```
+
+## Segment versioning
+
+The version number provides a form of [multi-version concurrency 
control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) (MVCC) 
to support batch-mode overwriting. If all you ever do is append data, then 
there will be just a single version for each time chunk. But when you overwrite 
data, Druid will seamlessly switch from querying the old version to instead 
query the new, updated versions. Specifically, a new set of segments is created 
with the same datasource, same time  [...]
+
+The switch appears to happen instantaneously to a user, because Druid handles 
this by first loading the new data (but not allowing it to be queried), and 
then, as soon as the new data is all loaded, switching all new queries to use 
those new segments. Then it drops the old segments a few minutes later.
+
+## Segment lifecycle
+
+Each segment has a lifecycle that involves the following three major areas:
+
+1. **Metadata store:** Segment metadata (a small JSON payload generally no 
more than a few KB) is stored in the [metadata 
store](../design/metadata-storage.md) once a segment is done being constructed. 
The act of inserting a record for a segment into the metadata store is called 
publishing. These metadata records have a boolean flag named `used`, which 
controls whether the segment is intended to be queryable or not. Segments 
created by realtime tasks will be
+available before they are published, since they are only published when the 
segment is complete and will not accept any additional rows of data.
+2. **Deep storage:** Segment data files are pushed to deep storage once a 
segment is done being constructed. This happens immediately before publishing 
metadata to the metadata store.
+3. **Availability for querying:** Segments are available for querying on some 
Druid data server, like a realtime task, directly from deep storage, or a 
Historical service.
+
+You can inspect the state of currently active segments using the Druid SQL
+[`sys.segments` table](../querying/sql-metadata-tables.md#segments-table). It 
includes the following flags:
+
+- `is_published`: True if segment metadata has been published to the metadata 
store and `used` is true.
+- `is_available`: True if the segment is currently available for querying, 
either on a realtime task or Historical service.
+- `is_realtime`: True if the segment is only available on realtime tasks. For 
datasources that use realtime ingestion, this will generally start off `true` 
and then become `false` as the segment is published and handed off.
+- `is_overshadowed`: True if the segment is published (with `used` set to 
true) and is fully overshadowed by some other published segments. Generally 
this is a transient state, and segments in this state will soon have their 
`used` flag automatically set to false.
+
+## Availability and consistency
+
+Druid has an architectural separation between ingestion and querying, as 
described above in
+[Indexing and handoff](#indexing-and-handoff). This means that when 
understanding Druid's availability and consistency properties, we must look at 
each function separately.
+
+On the ingestion side, Druid's primary [ingestion 
methods](../ingestion/index.md#ingestion-methods) are all pull-based and offer 
transactional guarantees. This means that you are guaranteed that ingestion 
using these methods will publish in an all-or-nothing manner:
+
+- Supervised "seekable-stream" ingestion methods like 
[Kafka](../development/extensions-core/kafka-ingestion.md) and 
[Kinesis](../development/extensions-core/kinesis-ingestion.md). With these 
methods, Druid commits stream offsets to its [metadata 
store](#metadata-storage) alongside segment metadata, in the same transaction. 
Note that ingestion of data that has not yet been published can be rolled back 
if ingestion tasks fail. In this case, partially-ingested data is
+discarded, and Druid will resume ingestion from the last committed set of 
stream offsets. This ensures exactly-once publishing behavior.
+- [Hadoop-based batch ingestion](../ingestion/hadoop.md). Each task publishes 
all segment metadata in a single transaction.
+- [Native batch ingestion](../ingestion/native-batch.md). In parallel mode, 
the supervisor task publishes all segment metadata in a single transaction 
after the subtasks are finished. In simple (single-task) mode, the single task 
publishes all segment metadata in a single transaction after it is complete.
+
+Additionally, some ingestion methods offer an _idempotency_ guarantee. This 
means that repeated executions of the same ingestion will not cause duplicate 
data to be ingested:
+
+- Supervised "seekable-stream" ingestion methods like 
[Kafka](../development/extensions-core/kafka-ingestion.md) and 
[Kinesis](../development/extensions-core/kinesis-ingestion.md) are idempotent 
due to the fact that stream offsets and segment metadata are stored together 
and updated in lock-step.
+- [Hadoop-based batch ingestion](../ingestion/hadoop.md) is idempotent unless 
one of your input sources is the same Druid datasource that you are ingesting 
into. In this case, running the same task twice is non-idempotent, because you 
are adding to existing data instead of overwriting it.
+- [Native batch ingestion](../ingestion/native-batch.md) is idempotent unless
+[`appendToExisting`](../ingestion/native-batch.md) is true, or one of your 
input sources is the same Druid datasource that you are ingesting into. In 
either of these two cases, running the same task twice is non-idempotent, 
because you are adding to existing data instead of overwriting it.
+
+On the query side, the Druid Broker is responsible for ensuring that a 
consistent set of segments is involved in a given query. It selects the 
appropriate set of segment versions to use when the query starts based on what 
is currently available. This is supported by atomic replacement, a feature that 
ensures that from a user's perspective, queries flip instantaneously from an 
older version of data to a newer set of data, with no consistency or 
performance impact.
+This is used for Hadoop-based batch ingestion, native batch ingestion when 
`appendToExisting` is false, and compaction.
+
+Note that atomic replacement happens for each time chunk individually. If a 
batch ingestion task or compaction involves multiple time chunks, then each 
time chunk will undergo atomic replacement soon after the task finishes, but 
the replacements will not all happen simultaneously.
+
+Typically, atomic replacement in Druid is based on a core set concept that 
works in conjunction with segment versions.
+When a time chunk is overwritten, a new core set of segments is created with a 
higher version number. The core set must all be available before the Broker 
will use them instead of the older set. There can also only be one core set per 
version per time chunk. Druid will also only use a single version at a time per 
time chunk. Together, these properties provide Druid's atomic replacement 
guarantees.
+
+Druid also supports an experimental segment locking mode that is activated by 
setting
+[`forceTimeChunkLock`](../ingestion/tasks.md#context) to false in the context 
of an ingestion task. In this case, Druid creates an atomic update group using 
the existing version for the time chunk, instead of creating a new core set 
with a new version number. There can be multiple atomic update groups with the 
same version number per time chunk. Each one replaces a specific set of earlier 
segments in the same time chunk and with the same version number. Druid will 
query the latest one th [...]
+
+If segments become unavailable due to multiple Historicals going offline 
simultaneously (beyond your replication factor), then Druid queries will 
include only the segments that are still available. In the background, Druid 
will reload these unavailable segments on other Historicals as quickly as 
possible, at which point they will be included in queries again.
\ No newline at end of file
diff --git a/docs/development/experimental-features.md 
b/docs/development/experimental-features.md
index 36c72822b4d..9e5252e9fe7 100644
--- a/docs/development/experimental-features.md
+++ b/docs/development/experimental-features.md
@@ -34,10 +34,10 @@ Note that this document does not track the status of 
contrib extensions, all of
 - [SQL-based ingestion concepts](../multi-stage-query/concepts.md)
 - [SQL-based ingestion and multi-stage query task 
API](../api-reference/sql-ingestion-api.md)
 
-## Indexer process
+## Indexer service
 
-- [Indexer process](../design/indexer.md)
-- [Processes and servers](../design/processes.md#indexer-process-optional)
+- [Indexer service](../design/indexer.md)
+- [Data server](../design/architecture.md#indexer-service-optional)
 
 ## Kubernetes
 
diff --git a/docs/development/modules.md b/docs/development/modules.md
index 75f4bbbe546..5b31d2d0aaa 100644
--- a/docs/development/modules.md
+++ b/docs/development/modules.md
@@ -105,7 +105,7 @@ In addition to DataSegmentPusher and DataSegmentPuller, you 
can also bind:
 
 * DataSegmentKiller: Removes segments, used as part of the Kill Task to delete 
unused segments, i.e. perform garbage collection of segments that are either 
superseded by newer versions or that have been dropped from the cluster.
 * DataSegmentMover: Allow migrating segments from one place to another, 
currently this is only used as part of the MoveTask to move unused segments to 
a different S3 bucket or prefix, typically to reduce storage costs of unused 
data (e.g. move to glacier or cheaper storage)
-* DataSegmentArchiver: Just a wrapper around Mover, but comes with a 
pre-configured target bucket/path, so it doesn't have to be specified at 
runtime as part of the ArchiveTask.
+* DataSegmentArchiver: Just a wrapper around Mover, but comes with a 
preconfigured target bucket/path, so it doesn't have to be specified at runtime 
as part of the ArchiveTask.
 
 ### Validating your deep storage implementation
 
diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md
index 007c2e93cd9..fe3e6e4ec5b 100644
--- a/docs/ingestion/index.md
+++ b/docs/ingestion/index.md
@@ -31,7 +31,7 @@ For most ingestion methods, the Druid 
[MiddleManager](../design/middlemanager.md
 [Indexer](../design/indexer.md) processes load your source data. The sole 
exception is Hadoop-based ingestion, which
 uses a Hadoop MapReduce job on YARN.
 
-During ingestion, Druid creates segments and stores them in [deep 
storage](../design/deep-storage.md). Historical nodes load the segments into 
memory to respond to queries. For streaming ingestion, the Middle Managers and 
indexers can respond to queries in real-time with arriving data. See the 
[Storage design](../design/architecture.md#storage-design) section of the Druid 
design documentation for more information.
+During ingestion, Druid creates segments and stores them in [deep 
storage](../design/deep-storage.md). Historical nodes load the segments into 
memory to respond to queries. For streaming ingestion, the Middle Managers and 
indexers can respond to queries in real-time with arriving data. For more 
information, see [Storage overview](../design/storage.md).
 
 This topic introduces streaming and batch ingestion methods. The following 
topics describe ingestion concepts and information that apply to all [ingestion 
methods](#ingestion-methods):
 
diff --git a/docs/querying/query-processing.md 
b/docs/querying/query-processing.md
new file mode 100644
index 00000000000..c94a4bf9cec
--- /dev/null
+++ b/docs/querying/query-processing.md
@@ -0,0 +1,48 @@
+---
+id: query-processing
+title: "Query processing"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+This topic provides a high-level overview of how Apache Druid distributes and 
processes queries.
+
+The general flow is as follows:
+
+1. A query enters the [Broker](../design/broker.md) service, which identifies 
the segments with data that may pertain to that query. The list of segments is 
always pruned by time, and may also be pruned by other attributes depending on 
how the datasource is partitioned.
+2. The Broker identifies which [Historical](../design/historical.md) and 
[MiddleManager](../design/middlemanager.md) services are serving those segments 
and distributes a rewritten subquery to each of the services.
+3. The Historical and MiddleManager services execute each subquery and return 
results to the Broker.
+4. The Broker merges the partial results to get the final answer, which it 
returns to the original caller.
+
+Druid uses time and attribute pruning to minimize the data it must scan for 
each query.
+
+For filters that are more precise than what the Broker uses for pruning, the 
[indexing structures](../design/storage.md#indexing-and-handoff) inside each 
segment allow Historical services to identify matching rows before accessing 
the data. Once the Historical service knows which rows match a particular 
query, it only accesses the requires rows and columns.
+
+To maximize query performance, Druid uses the following techniques:
+
+- Pruning the set of segments accessed for a query.
+- Within each segment, using indexes to identify which rows must be accessed.
+- Within each segment, only reading the specific rows and columns that are 
relevant to a particular query.
+
+## Learn more
+
+See the following topic for more information:
+
+* [Query execution](../querying/query-execution.md) to learn how Druid 
services process query statements.
\ No newline at end of file
diff --git a/website/.spelling b/website/.spelling
index b23c250c8ba..5304e3eea16 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -280,6 +280,7 @@ codebase
 codec
 colocated
 colocation
+colocating
 compactable
 compactionTask
 config
@@ -434,7 +435,7 @@ pre-aggregation
 pre-computation
 pre-compute
 pre-computing
-pre-configured
+preconfigured
 pre-existing
 pre-filtered
 pre-filtering
diff --git a/website/redirects.js b/website/redirects.js
index eb869ac175a..db3160513e6 100644
--- a/website/redirects.js
+++ b/website/redirects.js
@@ -310,11 +310,14 @@ const Redirects=[
     "from":  "/docs/latest/operations/api-reference.html",
     "to": "/docs/latest/api-reference/"
   },
+  {
+    "from": "/docs/latest/design/processes.html",
+    "to": "/docs/latest/design/architecture"
+  },
   {
     "from":  "/docs/latest/operations/api-reference/",
     "to": "/docs/latest/api-reference/"
-  },
-
+  }
 ]
 
 
diff --git a/website/sidebars.json b/website/sidebars.json
index c8ee4ef3859..e3b8186e237 100644
--- a/website/sidebars.json
+++ b/website/sidebars.json
@@ -30,8 +30,22 @@
     ],
     "Design": [
       "design/architecture",
+      {
+        "type": "category",
+        "label": "Services",
+        "items": [
+          "design/coordinator",
+          "design/overlord",
+          "design/broker",
+          "design/router",
+          "design/historical",
+          "design/middlemanager",
+          "design/peons",
+          "design/indexer"
+        ]
+      },
+      "design/storage",
       "design/segments",
-      "design/processes",
       "design/deep-storage",
       "design/metadata-storage",
       "design/zookeeper"
@@ -125,6 +139,7 @@
         ]
       },
       "querying/querying",
+      "querying/query-processing",
       "querying/query-execution",
       "querying/troubleshooting",
       {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: Revamp design page (#15486)

Reply via email to