[druid] branch 27.0.0 updated: docs: query from deep storage (#14609) (#14754)

amatya Fri, 04 Aug 2023 00:25:36 -0700

This is an automated email from the ASF dual-hosted git repository.

amatya pushed a commit to branch 27.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/27.0.0 by this push:
     new 25e6e28e65 docs: query from deep storage (#14609) (#14754)
25e6e28e65 is described below

commit 25e6e28e657767885f2126e1ec1e016c1d20e873
Author: 317brian <[email protected]>
AuthorDate: Fri Aug 4 00:25:23 2023 -0700

    docs: query from deep storage (#14609) (#14754)
    
    * docs: query from deep storage (#14609)
    
    * cold tier wip
    
    * wip
    
    * copyedits
    
    * wip
    
    * copyedits
    
    * copyedits
    
    * wip
    
    * wip
    
    * update rules page
    
    * typo
    
    * typo
    
    * update sidebar
    
    * moves durable storage info to its own page in operations
    
    * update screenshots
    
    * add apache license
    
    * Apply suggestions from code review
    
    Co-authored-by: Victoria Lim <[email protected]>
    
    * add query from deep storage tutorial stub
    
    * address some of the feedback
    
    * revert screenshot update. handled in separate pr
    
    * load rule update
    
    * wip tutorial
    
    * reformat deep storage endpoints
    
    * rest of tutorial
    
    * typo
    
    * cleanup
    
    * screenshot and sidebar for tutorial
    
    * add license
    
    * typos
    
    * Apply suggestions from code review
    
    Co-authored-by: Victoria Lim <[email protected]>
    
    * rest of review comments
    
    * clarify where results are stored
    
    * update api reference for durablestorage context param
    
    * Apply suggestions from code review
    
    Co-authored-by: Karan Kumar <[email protected]>
    
    * comments
    
    * incorporate #14720
    
    * address rest of comments
    
    * missed one
    
    * Update docs/api-reference/sql-api.md
    
    * Update docs/api-reference/sql-api.md
    
    ---------
    
    Co-authored-by: Victoria Lim <[email protected]>
    Co-authored-by: demo-kratia <[email protected]>
    Co-authored-by: Karan Kumar <[email protected]>
    (cherry picked from commit 3b5b6c6a418c20b7e70e42074202b14b0ff45c7c)
    
    * change code tab styles to Docusaurus2 format from Docu1 format
    
    * fix spelling file
    
    * doc cleanup
    
    (cherry picked from commit 2218a134cfbfbb737f05a0455fdd24f3f7b79afd)
    
    ---------
    
    Co-authored-by: Laksh Singla <[email protected]>
---
 docs/api-reference/sql-api.md                      | 864 ++++++++++++++++++++-
 .../tutorial-query-deepstorage-retention-rule.png  | Bin 0 -> 99106 bytes
 docs/design/architecture.md                        |  23 +-
 docs/design/deep-storage.md                        |  26 +-
 docs/multi-stage-query/reference.md                |  55 +-
 docs/operations/durable-storage.md                 |  86 ++
 docs/operations/rule-configuration.md              |  10 +-
 docs/querying/query-from-deep-storage.md           | 195 +++++
 docs/tutorials/tutorial-query-deep-storage.md      | 293 +++++++
 website/.spelling                                  |   1 +
 website/sidebars.json                              |   3 +
 11 files changed, 1487 insertions(+), 69 deletions(-)

diff --git a/docs/api-reference/sql-api.md b/docs/api-reference/sql-api.md
index aaaf499851..bfb74a4e02 100644
--- a/docs/api-reference/sql-api.md
+++ b/docs/api-reference/sql-api.md
@@ -4,7 +4,12 @@ title: Druid SQL API
 sidebar_label: Druid SQL
 ---
 
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
 <!--
+
   ~ Licensed to the Apache Software Foundation (ASF) under one
   ~ or more contributor license agreements.  See the NOTICE file
   ~ distributed with this work for additional information
@@ -34,7 +39,7 @@ The Druid SQL API is available at 
`https://ROUTER:8888/druid/v2/sql`, where `ROU
 To use the SQL API to make Druid SQL queries, send your query to the Router 
using the POST method:
 ```
 POST https://ROUTER:8888/druid/v2/sql/
-```  
+```
 
 Submit your query as the value of a "query" field in the JSON object within 
the request payload. For example:
 ```json
@@ -42,7 +47,7 @@ Submit your query as the value of a "query" field in the JSON 
object within the
 ```
 
 ### Request body
-      
+
 |Property|Description|Default|
 |--------|----|-----------|
 |`query`|SQL query string.| none (required)|
@@ -178,12 +183,861 @@ You can cancel the query using the query id `myQuery01` 
as follows:
 curl --request DELETE 'https://ROUTER:8888/druid/v2/sql/myQuery01' \
 ```
 
-Cancellation requests require READ permission on all resources used in the SQL 
query. 
+Cancellation requests require READ permission on all resources used in the SQL 
query.
 
 Druid returns an HTTP 202 response for successful deletion requests.
 
 Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
-  
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> Query from deep storage is an [experimental 
feature](../development/experimental.md).
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules.
+
+Note that at least one segment of a datasource must be available on a 
Historical process so that the Broker can plan your query. A quick way to check 
if this is true is whether or not a datasource is visible in the Druid console.
+
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+Submit a query for data stored in deep storage. Any data ingested into Druid 
is placed into deep storage. The query is contained in the "query" field in the 
JSON object within the request payload.
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query and only the user who submits a query 
can see the results.
+
+#### URL
+
+<code class="postAPI">POST</code> <code>/druid/v2/sql/statements</code>
+
+#### Request body
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`:
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. You must manually retrieve your results after 
the query completes.
+   - `selectDestination` determines where final results get written. By 
default, results are written to task reports. Set this parameter to 
`durableStorage` to instruct Druid to write the results from SELECT queries to 
durable storage, which allows you to fetch larger result sets. Note that this 
requires you to have [durable storage for MSQ 
enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON LINES.
+
+#### Responses
+
+<Tabs>
+
+<TabItem value="1" label="200 SUCCESS">
+
+
+*Successfully queried from deep storage*
+
+</TabItem>
+<TabItem value="2" label="400 BAD REQUEST">
+
+
+*Error thrown due to bad query. Returns a JSON object detailing the error with 
the following format:*
+
+```json
+{
+    "error": "Summary of the encountered error.",
+    "errorClass": "Class of exception that caused this error.",
+    "host": "The host on which the error occurred.",
+    "errorCode": "Well-defined error code.",
+    "persona": "Role or persona associated with the error.",
+    "category": "Classification of the error.",
+    "errorMessage": "Summary of the encountered issue with expanded 
information.",
+    "context": "Additional context about the error."
+}
+```
+
+</TabItem>
+</Tabs>
+
+---
+
+#### Sample request
+
+<Tabs>
+
+<TabItem value="3" label="cURL">
+
+
+```shell
+curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements"; \
+--header 'Content-Type: application/json' \
+--data '{
+    "query": "SELECT * FROM wikipedia WHERE user='\''BlueMoon2662'\''",
+    "context": {
+        "executionMode":"ASYNC"
+    }
+}'
+```
+
+</TabItem>
+<TabItem value="4" label="HTTP">
+
+
+```HTTP
+POST /druid/v2/sql/statements HTTP/1.1
+Host: http://ROUTER_IP:ROUTER_PORT
+Content-Type: application/json
+Content-Length: 134
+
+{
+    "query": "SELECT * FROM wikipedia WHERE user='BlueMoon2662'",
+    "context": {
+        "executionMode":"ASYNC"
+    }
+}
+```
+
+</TabItem>
+</Tabs>
+
+#### Sample response
+
+<details>
+  <summary>Click to show sample response</summary>
+
+  ```json
+{
+    "queryId": "query-b82a7049-b94f-41f2-a230-7fef94768745",
+    "state": "ACCEPTED",
+    "createdAt": "2023-07-26T21:16:25.324Z",
+    "schema": [
+        {
+            "name": "__time",
+            "type": "TIMESTAMP",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "channel",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "cityName",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "comment",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "countryIsoCode",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "countryName",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "isAnonymous",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isMinor",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isNew",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isRobot",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isUnpatrolled",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "metroCode",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "namespace",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "page",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "regionIsoCode",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "regionName",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "user",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "delta",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "added",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "deleted",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        }
+    ],
+    "durationMs": -1
+}
+  ```
+</details>
+
+### Get query status
+
+Retrieves information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running and the execution mode is  `ASYNC`. In addition to the fields that this 
endpoint shares with `POST /sql/statements`, a completed query's status 
includes the following:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and sample records.
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results.
+  - `sizeInBytes`: the size of the page.
+  - `id`: the page number that you can use to reference a specific page when 
you get query results.
+
+#### URL
+
+<code class="getAPI">GET</code> <code>/druid/v2/sql/statements/:queryId</code>
+
+#### Responses
+
+<Tabs>
+
+<TabItem value="5" label="200 SUCCESS">
+
+
+*Successfully retrieved query status*
+
+</TabItem>
+<TabItem value="6" label="400 BAD REQUEST">
+
+
+*Error thrown due to bad query. Returns a JSON object detailing the error with 
the following format:*
+
+```json
+{
+    "error": "Summary of the encountered error.",
+    "errorCode": "Well-defined error code.",
+    "persona": "Role or persona associated with the error.",
+    "category": "Classification of the error.",
+    "errorMessage": "Summary of the encountered issue with expanded 
information.",
+    "context": "Additional context about the error."
+}
+```
+
+</TabItem>
+</Tabs>
+
+#### Sample request
+
+The following example retrieves the status of a query with specified ID 
`query-9b93f6f7-ab0e-48f5-986a-3520f84f0804`.
+
+<Tabs>
+
+<TabItem value="7" label="cURL">
+
+
+```shell
+curl 
"http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-9b93f6f7-ab0e-48f5-986a-3520f84f0804";
+```
+
+</TabItem>
+<TabItem value="8" label="HTTP">
+
+
+```HTTP
+GET /druid/v2/sql/statements/query-9b93f6f7-ab0e-48f5-986a-3520f84f0804 
HTTP/1.1
+Host: http://ROUTER_IP:ROUTER_PORT
+```
+
+</TabItem>
+</Tabs>
+
+#### Sample response
+
+<details>
+  <summary>Click to show sample response</summary>
+
+  ```json
+{
+    "queryId": "query-9b93f6f7-ab0e-48f5-986a-3520f84f0804",
+    "state": "SUCCESS",
+    "createdAt": "2023-07-26T22:57:46.620Z",
+    "schema": [
+        {
+            "name": "__time",
+            "type": "TIMESTAMP",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "channel",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "cityName",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "comment",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "countryIsoCode",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "countryName",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "isAnonymous",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isMinor",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isNew",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isRobot",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "isUnpatrolled",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "metroCode",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "namespace",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "page",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "regionIsoCode",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "regionName",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "user",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        },
+        {
+            "name": "delta",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "added",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        },
+        {
+            "name": "deleted",
+            "type": "BIGINT",
+            "nativeType": "LONG"
+        }
+    ],
+    "durationMs": 25591,
+    "result": {
+        "numTotalRows": 1,
+        "totalSizeInBytes": 375,
+        "dataSource": "__query_select",
+        "sampleRecords": [
+            [
+                1442018873259,
+                "#ja.wikipedia",
+                "",
+                "/* 対戦通算成績と得失点 */",
+                "",
+                "",
+                0,
+                1,
+                0,
+                0,
+                0,
+                0,
+                "Main",
+                "アルビレックス新潟の年度別成績一覧",
+                "",
+                "",
+                "BlueMoon2662",
+                14,
+                14,
+                0
+            ]
+        ],
+        "pages": [
+            {
+                "id": 0,
+                "numRows": 1,
+                "sizeInBytes": 375
+            }
+        ]
+    }
+}
+  ```
+</details>
+
+
+### Get query results
+
+Retrieves results for completed queries. Results are separated into pages, so 
you can use the optional `page` parameter to refine the results you get. Druid 
returns information about the composition of each page and its page number 
(`id`). For information about pages, see [Get query status](#get-query-status).
+
+If a page number isn't passed, all results are returned sequentially in the 
same response. If you have large result sets, you may encounter timeouts based 
on the value configured for `druid.router.http.readTimeout`.
+
+When getting query results, keep the following in mind:
+
+- JSON Lines is the only supported result format.
+- Getting the query results for an ingestion query returns an empty response.
+
+#### URL
+
+<code class="getAPI">GET</code> 
<code>/druid/v2/sql/statements/:queryId/results</code>
+
+#### Query parameters
+* `page`
+    * Int (optional)
+    * Refine paginated results
+
+#### Responses
+
+<Tabs>
+
+<TabItem value="9" label="200 SUCCESS">
+
+
+*Successfully retrieved query results*
+
+</TabItem>
+<TabItem value="10" label="400 BAD REQUEST">
+
+
+*Query in progress. Returns a JSON object detailing the error with the 
following format:*
+
+```json
+{
+    "error": "Summary of the encountered error.",
+    "errorCode": "Well-defined error code.",
+    "persona": "Role or persona associated with the error.",
+    "category": "Classification of the error.",
+    "errorMessage": "Summary of the encountered issue with expanded 
information.",
+    "context": "Additional context about the error."
+}
+```
+
+</TabItem>
+<TabItem value="11" label="404 NOT FOUND">
+
+
+*Query not found, failed or canceled*
+
+</TabItem>
+<TabItem value="12" label="500 SERVER ERROR">
+
+
+*Error thrown due to bad query. Returns a JSON object detailing the error with 
the following format:*
+
+```json
+{
+    "error": "Summary of the encountered error.",
+    "errorCode": "Well-defined error code.",
+    "persona": "Role or persona associated with the error.",
+    "category": "Classification of the error.",
+    "errorMessage": "Summary of the encountered issue with expanded 
information.",
+    "context": "Additional context about the error."
+}
+```
+
+</TabItem>
+</Tabs>
+
+---
+
+#### Sample request
+
+The following example retrieves the status of a query with specified ID 
`query-f3bca219-173d-44d4-bdc7-5002e910352f`.
+
+<Tabs>
+
+<TabItem value="13" label="cURL">
+
+
+```shell
+curl 
"http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-f3bca219-173d-44d4-bdc7-5002e910352f/results";
+```
+
+</TabItem>
+<TabItem value="14" label="HTTP">
+
+
+```HTTP
+GET 
/druid/v2/sql/statements/query-f3bca219-173d-44d4-bdc7-5002e910352f/results 
HTTP/1.1
+Host: http://ROUTER_IP:ROUTER_PORT
+```
+
+</TabItem>
+</Tabs>
+
+#### Sample response
+
+<details>
+  <summary>Click to show sample response</summary>
+
+  ```json
+[
+    {
+        "__time": 1442018818771,
+        "channel": "#en.wikipedia",
+        "cityName": "",
+        "comment": "added project",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 0,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Talk",
+        "page": "Talk:Oswald Tilghman",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "GELongstreet",
+        "delta": 36,
+        "added": 36,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018820496,
+        "channel": "#ca.wikipedia",
+        "cityName": "",
+        "comment": "Robot inserta {{Commonscat}} que enllaça amb 
[[commons:category:Rallicula]]",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 1,
+        "isNew": 0,
+        "isRobot": 1,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Rallicula",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "PereBot",
+        "delta": 17,
+        "added": 17,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018825474,
+        "channel": "#en.wikipedia",
+        "cityName": "Auburn",
+        "comment": "/* Status of peremptory norms under international law */ 
fixed spelling of 'Wimbledon'",
+        "countryIsoCode": "AU",
+        "countryName": "Australia",
+        "isAnonymous": 1,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 0,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Peremptory norm",
+        "regionIsoCode": "NSW",
+        "regionName": "New South Wales",
+        "user": "60.225.66.142",
+        "delta": 0,
+        "added": 0,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018828770,
+        "channel": "#vi.wikipedia",
+        "cityName": "",
+        "comment": "fix Lỗi CS1: ngày tháng",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 1,
+        "isNew": 0,
+        "isRobot": 1,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Apamea abruzzorum",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "Cheers!-bot",
+        "delta": 18,
+        "added": 18,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018831862,
+        "channel": "#vi.wikipedia",
+        "cityName": "",
+        "comment": "clean up using [[Project:AWB|AWB]]",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 1,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Atractus flammigerus",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "ThitxongkhoiAWB",
+        "delta": 18,
+        "added": 18,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018833987,
+        "channel": "#vi.wikipedia",
+        "cityName": "",
+        "comment": "clean up using [[Project:AWB|AWB]]",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 1,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Agama mossambica",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "ThitxongkhoiAWB",
+        "delta": 18,
+        "added": 18,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018837009,
+        "channel": "#ca.wikipedia",
+        "cityName": "",
+        "comment": "/* Imperi Austrohongarès */",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 0,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Campanya dels Balcans (1914-1918)",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "Jaumellecha",
+        "delta": -20,
+        "added": 0,
+        "deleted": 20
+    },
+    {
+        "__time": 1442018839591,
+        "channel": "#en.wikipedia",
+        "cityName": "",
+        "comment": "adding comment on notability and possible COI",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 1,
+        "isRobot": 0,
+        "isUnpatrolled": 1,
+        "metroCode": 0,
+        "namespace": "Talk",
+        "page": "Talk:Dani Ploeger",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "New Media Theorist",
+        "delta": 345,
+        "added": 345,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018841578,
+        "channel": "#en.wikipedia",
+        "cityName": "",
+        "comment": "Copying assessment table to wiki",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 1,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "User",
+        "page": "User:WP 1.0 bot/Tables/Project/Pubs",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "WP 1.0 bot",
+        "delta": 121,
+        "added": 121,
+        "deleted": 0
+    },
+    {
+        "__time": 1442018845821,
+        "channel": "#vi.wikipedia",
+        "cityName": "",
+        "comment": "clean up using [[Project:AWB|AWB]]",
+        "countryIsoCode": "",
+        "countryName": "",
+        "isAnonymous": 0,
+        "isMinor": 0,
+        "isNew": 0,
+        "isRobot": 1,
+        "isUnpatrolled": 0,
+        "metroCode": 0,
+        "namespace": "Main",
+        "page": "Agama persimilis",
+        "regionIsoCode": "",
+        "regionName": "",
+        "user": "ThitxongkhoiAWB",
+        "delta": 18,
+        "added": 18,
+        "deleted": 0
+    }
+]
+  ```
+</details>
+
+### Cancel a query
+
+Cancels a running or accepted query.
+
+#### URL
+
+<code class="deleteAPI">DELETE</code> 
<code>/druid/v2/sql/statements/:queryId</code>
+
+#### Responses
+
+<Tabs>
+
+<TabItem value="15" label="200 OK">
+
+
+*A no op operation since the query is not in a state to be cancelled*
+
+</TabItem>
+<TabItem value="16" label="202 ACCEPTED">
+
+
+*Successfully accepted query for cancellation*
+
+</TabItem>
+<TabItem value="17" label="404 SERVER ERROR">
+
+
+*Invalid query ID. Returns a JSON object detailing the error with the 
following format:*
+
+```json
+{
+    "error": "Summary of the encountered error.",
+    "errorCode": "Well-defined error code.",
+    "persona": "Role or persona associated with the error.",
+    "category": "Classification of the error.",
+    "errorMessage": "Summary of the encountered issue with expanded 
information.",
+    "context": "Additional context about the error."
+}
+```
+
+</TabItem>
+</Tabs>
+
+---
+
+#### Sample request
+
+The following example cancels a query with specified ID 
`query-945c9633-2fa2-49ab-80ae-8221c38c024da`.
+
+<Tabs>
+
+<TabItem value="18" label="cURL">
+
+
+```shell
+curl --request DELETE 
"http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-945c9633-2fa2-49ab-80ae-8221c38c024da";
+```
+
+</TabItem>
+<TabItem value="19" label="HTTP">
+
+
+```HTTP
+DELETE /druid/v2/sql/statements/query-945c9633-2fa2-49ab-80ae-8221c38c024da 
HTTP/1.1
+Host: http://ROUTER_IP:ROUTER_PORT
+```
+
+</TabItem>
+</Tabs>
+
+#### Sample response
+
+A successful request returns a `202 ACCEPTED` response and an empty response.
diff --git a/docs/assets/tutorial-query-deepstorage-retention-rule.png 
b/docs/assets/tutorial-query-deepstorage-retention-rule.png
new file mode 100644
index 0000000000..9dee37bdea
Binary files /dev/null and 
b/docs/assets/tutorial-query-deepstorage-retention-rule.png differ
diff --git a/docs/design/architecture.md b/docs/design/architecture.md
index 323cfb4fd3..df59dcb25e 100644
--- a/docs/design/architecture.md
+++ b/docs/design/architecture.md
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- To store all the data you ingest. Segments that get loaded onto Historical 
processes for low latency queries are also kept in deep storage for backup 
purposes. Additionally, segments that are only in deep storage can be used for 
[queries from deep storage](../querying/query-from-deep-storage.md).
+- As a way to transfer data in the background between Druid processes. Druid 
stores data in files called _segments_.
+
+Historical processes cache data segments on local disk and serve queries from 
that cache as well as from an in-memory cache.
+Segments on disk for Historical processes provide the low latency querying 
performance Druid is known for.
+
+You can also query directly from deep storage. When you query segments that 
exist only in deep storage, you trade some performance  for the ability to 
query more of your data without necessarily having to scale your Historical 
processes.
+
+When determining sizing for your storage, keep the following in mind:
+
+- Deep storage needs to be able to hold all the data that you ingest into 
Druid.
+- On disk storage for Historical processes need to be able to accommodate the 
data you want to load onto them to run queries. The data on Historical 
processes should be data you access frequently and need to run low latency 
queries for. 
 
 Deep storage is an important part of Druid's elastic, fault-tolerant design. 
Druid bootstraps from deep storage even
 if every single data server is lost and re-provisioned.
@@ -210,8 +218,7 @@ available before they are published, since they are only 
published when the segm
 any additional rows of data.
 2. **Deep storage:** Segment data files are pushed to deep storage once a 
segment is done being constructed. This
 happens immediately before publishing metadata to the metadata store.
-3. **Availability for querying:** Segments are available for querying on some 
Druid data server, like a realtime task
-or a Historical process.
+3. **Availability for querying:** Segments are available for querying on some 
Druid data server, like a realtime task, directly from deep storage, or a 
Historical process.
 
 You can inspect the state of currently active segments using the Druid SQL
 [`sys.segments` table](../querying/sql-metadata-tables.md#segments-table). It 
includes the following flags:
diff --git a/docs/design/deep-storage.md b/docs/design/deep-storage.md
index f5adf35c6a..0674f32429 100644
--- a/docs/design/deep-storage.md
+++ b/docs/design/deep-storage.md
@@ -23,9 +23,15 @@ title: "Deep storage"
   -->
 
 
-Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data, as long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
+Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data. As long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
 
-## Local
+In addition to being the backing store for segments, you can use [query from 
deep storage](#querying-from-deep-storage) and run queries against segments 
stored primarily in deep storage. The [load 
rules](../operations/rule-configuration.md#load-rules) you configure determine 
whether segments exist primarily in deep storage or in a combination of deep 
storage and Historical processes.
+
+## Deep storage options
+
+Druid supports multiple options for deep storage, including blob storage from 
major cloud providers. Select the one that fits your environment.
+
+### Local
 
 Local storage is intended for use in the following situations:
 
@@ -55,22 +61,28 @@ druid.storage.storageDirectory=/tmp/druid/localStorage
 The `druid.storage.storageDirectory` must be set to a different path than 
`druid.segmentCache.locations` or
 `druid.segmentCache.infoDir`.
 
-## Amazon S3 or S3-compatible
+### Amazon S3 or S3-compatible
 
 See [`druid-s3-extensions`](../development/extensions-core/s3.md).
 
-## Google Cloud Storage
+### Google Cloud Storage
 
 See [`druid-google-extensions`](../development/extensions-core/google.md).
 
-## Azure Blob Storage
+### Azure Blob Storage
 
 See [`druid-azure-extensions`](../development/extensions-core/azure.md).
 
-## HDFS
+### HDFS
 
 See [druid-hdfs-storage extension 
documentation](../development/extensions-core/hdfs.md).
 
-## Additional options
+### Additional options
 
 For additional deep storage options, please see our [extensions 
list](../configuration/extensions.md).
+
+## Querying from deep storage
+
+Although not as performant as querying segments stored on disk for Historical 
processes, you can query from deep storage to access segments that you may not 
need frequently or with the extreme low latency Druid queries traditionally 
provide. You trade some performance for a total lower storage cost because you 
can access more of your data without the need to increase the number or 
capacity of your Historical processes.
+
+For information about how to run queries, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
\ No newline at end of file
diff --git a/docs/multi-stage-query/reference.md 
b/docs/multi-stage-query/reference.md
index 8f9e53b557..0cab3a2fd8 100644
--- a/docs/multi-stage-query/reference.md
+++ b/docs/multi-stage-query/reference.md
@@ -342,59 +342,24 @@ CLUSTERED BY user
 
 The context parameter that sets `sqlJoinAlgorithm` to `sortMerge` is not shown 
in the above example.
 
-## Durable Storage
+## Durable storage
 
-Using durable storage with your SQL-based ingestion can improve their 
reliability by writing intermediate files to a storage location temporarily. 
+SQL-based ingestion supports using durable storage to store intermediate files 
temporarily. Enabling it can improve reliability. For more information, see 
[Durable storage](../operations/durable-storage.md).
 
-To prevent durable storage from getting filled up with temporary files in case 
the tasks fail to clean them up, a periodic
-cleaner can be scheduled to clean the directories corresponding to which there 
isn't a controller task running. It utilizes
-the storage connector to work upon the durable storage. The durable storage 
location should only be utilized to store the output
-for cluster's MSQ tasks. If the location contains other files or directories, 
then they will get cleaned up as well.
-
-Enabling durable storage also enables the use of local disk to store temporary 
files, such as the intermediate files produced
-by the super sorter.  Tasks will use whatever has been configured for their 
temporary usage as described in [Configuring task storage 
sizes](../ingestion/tasks.md#configuring-task-storage-sizes)
-If the configured limit is too low, `NotEnoughTemporaryStorageFault` may be 
thrown.
-
-### Enable durable storage
-
-To enable durable storage, you need to set the following common service 
properties:
-
-```
-druid.msq.intermediate.storage.enable=true
-druid.msq.intermediate.storage.type=s3
-druid.msq.intermediate.storage.bucket=YOUR_BUCKET
-druid.msq.intermediate.storage.prefix=YOUR_PREFIX
-druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
-```
-
-For detailed information about the settings related to durable storage, see 
[Durable storage configurations](#durable-storage-configurations).
-
-
-### Use durable storage for queries
-
-When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
-
-For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
-
-Set `selectDestination`:`durableStorage` for select queries that want to write 
the final results to durable storage instead of the task reports. Saving the 
results in the durable
-storage allows users to fetch large result sets. The location where the 
workers write the intermediate results is different than the location where 
final results get stored. Therefore, `durableShuffleStorage`:`false` and
-`selectDestination`:`durableStorage` is a valid configuration to use in the 
query context, that instructs the controller to persist only the final result 
in the durable storage, and not the
-intermediate results.
-
-
-## Durable storage configurations
+### Durable storage configurations
 
 The following common service properties control how durable storage behaves:
 
 |Parameter          |Default                                 | Description     
     |
 
|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you 
want to store intermediate files.  |
-|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the 
size of each chunk to temporarily store in 
`druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB 
and 5 GiB. A large chunk size reduces the API calls made to the durable 
storage, however it requires more disk space to store the temporary chunks. 
Druid uses a default of 100MiB if the value is not provided.| 
-|`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable 
durable storage for the cluster.|
-|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max 
number times to attempt S3 API calls to avoid failures due to transient errors. 
| 
-|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store 
intermediate stage results. Provide a unique value for the prefix. Don't share 
the same prefix between clusters. If the location  includes other files or 
directories, then they will get cleaned up as well.  |
+|`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable 
durable storage for the cluster. For more information about enabling durable 
storage, see [Durable storage](../operations/durable-storage.md).|
+|`druid.msq.intermediate.storage.type` | `s3` for Amazon S3 | Required. The 
type of storage to use.  `s3` is the only supported storage type.  |
+|`druid.msq.intermediate.storage.bucket` | n/a | The S3 bucket to store 
intermediate files.  |
+|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store 
intermediate stage results. Provide a unique value for the prefix. Don't share 
the same prefix between clusters. If the location includes other files or 
directories, then they will get cleaned up as well.  |
 |`druid.msq.intermediate.storage.tempDir`| n/a | Required. Directory path on 
the local disk to temporarily store intermediate stage results.  |
-|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | 
Required. The type of storage to use. You can either set this to `local` or 
`s3`.  |
+|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max 
number times to attempt S3 API calls to avoid failures due to transient errors. 
| 
+|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the 
size of each chunk to temporarily store in 
`druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB 
and 5 GiB. A large chunk size reduces the API calls made to the durable 
storage, however it requires more disk space to store the temporary chunks. 
Druid uses a default of 100MiB if the value is not provided.| 
+
 
 In addition to the common service properties, there are certain properties 
that you configure on the Overlord specifically to clean up intermediate files:
 
diff --git a/docs/operations/durable-storage.md 
b/docs/operations/durable-storage.md
new file mode 100644
index 0000000000..80545f9a9b
--- /dev/null
+++ b/docs/operations/durable-storage.md
@@ -0,0 +1,86 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and 
SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you 
can write the results of deep storage queries to. Durable storage for SQL-based 
ingestion is used to temporarily house intermediate files, which can improve 
reliability.
+
+Enabling durable storage also enables the use of local disk to store temporary 
files, such as the intermediate files produced
+while sorting the data. Tasks will use whatever has been configured for their 
temporary usage as described in [Configuring task storage 
sizes](../ingestion/tasks.md#configuring-task-storage-sizes).
+If the configured limit is too low, Druid may throw the error, 
`NotEnoughTemporaryStorageFault`.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service 
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
+
+## Use durable storage for queries from deep storage
+
+Depending on the size of the results you're expecting, saving the final 
results for queries from deep storage to durable storage might be needed.
+
+By default, Druid saves the final results for queries from deep storage to 
task reports. Generally, this is acceptable for smaller result sets but may 
lead to timeouts for larger result sets. 
+
+When you run a query, include the context parameter `selectDestination` and 
set it to `DURABLESTORAGE`:
+
+```json
+    "context":{
+        ...
+        "selectDestination": "DURABLESTORAGE"
+    }
+```
+
+You can also write intermediate results to durable storage 
(`durableShuffleStorage`) for better reliability. The location where workers 
write intermediate results is different than the location where final results 
get stored. This means that durable storage for results can be enabled even if 
you don't write intermediate results to durable storage. 
+
+If you write the results for queries from deep storage to durable storage, the 
results are cleaned up when the task is removed from the metadata store. 
+
+## Durable storage clean up
+
+To prevent durable storage from getting filled up with temporary files in case 
the tasks fail to clean them up, a periodic
+cleaner can be scheduled to clean the directories corresponding to which there 
isn't a controller task running. It utilizes
+the storage connector to work upon the durable storage. The durable storage 
location should only be utilized to store the output
+for the cluster's MSQ tasks. If the location contains other files or 
directories, then they will get cleaned up as well.
+
+Use `druid.msq.intermediate.storage.cleaner.enabled` and 
`druid.msq.intermediate.storage.cleaner.delaySEconds` to configure the cleaner. 
For more information, see [Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+Note that if you choose to write query results to durable storage,the results 
are cleaned up when the task is removed from the metadata store.
+
diff --git a/docs/operations/rule-configuration.md 
b/docs/operations/rule-configuration.md
index 8d12beac96..aa42ef461b 100644
--- a/docs/operations/rule-configuration.md
+++ b/docs/operations/rule-configuration.md
@@ -107,7 +107,7 @@ In the web console you can use the up and down arrows on 
the right side of the i
 
 ## Load rules
 
-Load rules define how Druid assigns segments to [historical process 
tiers](./mixed-workloads.md#historical-tiering), and how many replicas of a 
segment exist in each tier.
+Load rules define how Druid assigns segments to [Historical process 
tiers](./mixed-workloads.md#historical-tiering), and how many replicas of a 
segment exist in each tier.
 
 If you have a single tier, Druid automatically names the tier `_default`. If 
you define an additional tier, you must define a load rule to specify which 
segments to load on that tier. Until you define a load rule, your new tier 
remains empty.
 
@@ -120,6 +120,8 @@ All load rules can have these properties:
 
 Specific types of load rules discussed below may have other properties too.
 
+Load rules are also how you take advantage of the resource savings that [query 
the data from deep storage](../querying/query-from-deep-storage.md) provides. 
One way to configure data so that certain segments are not loaded onto 
Historical tiers but are available to query from deep storage is to set 
`tieredReplicants` to an empty array and `useDefaultTierForNull` to `false` for 
those segments, either by interval or by period.
+
 ### Forever load rule
 
 The forever load rule assigns all datasource segments to specified tiers. It 
is the default rule Druid applies to datasources. Forever load rules have type 
`loadForever`.
@@ -167,7 +169,7 @@ Set the following properties:
   - the segment interval starts any time after the rule interval starts.
 
   You can use this property to load segments with future start and end dates, 
where "future" is relative to the time when the Coordinator evaluates data 
against the rule. Defaults to `true`.
-- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier.
+- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier. 
 - `useDefaultTierForNull`: This parameter determines the default value of 
`tieredReplicants` and only has an effect if the field is not present. The 
default value of `useDefaultTierForNull` is true.
 
 ### Interval load rule
@@ -190,7 +192,7 @@ Interval load rules have type `loadByInterval`. The 
following example places one
 Set the following properties:
 
 - `interval`: the load interval specified as an [ISO 
8601](https://en.wikipedia.org/wiki/ISO_8601) range encoded as a string.
-- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier.
+- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier. 
 - `useDefaultTierForNull`: This parameter determines the default value of 
`tieredReplicants` and only has an effect if the field is not present. The 
default value of `useDefaultTierForNull` is true.
 
 ## Drop rules
@@ -256,7 +258,7 @@ Set the following property:
 
 ### Interval drop rule
 
-You can use a drop interval rule to prevent Druid from loading a specified 
range of data onto any tier. The range is typically your oldest data. The 
dropped data resides in cold storage, but is not queryable. If you need to 
query the data, update or remove the interval drop rule so that Druid reloads 
the data.
+You can use a drop interval rule to prevent Druid from loading a specified 
range of data onto any tier. The range is typically your oldest data. The 
dropped data resides in deep storage and can still be [queried from deep 
storage](../querying/query-from-deep-storage.md). 
 
 Interval drop rules have type `dropByInterval` and the following JSON 
structure:
 
diff --git a/docs/querying/query-from-deep-storage.md 
b/docs/querying/query-from-deep-storage.md
new file mode 100644
index 0000000000..5f076ca47c
--- /dev/null
+++ b/docs/querying/query-from-deep-storage.md
@@ -0,0 +1,195 @@
+---
+id: query-deep-storage
+title: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+> Query from deep storage is an [experimental 
feature](../development/experimental.md).
+
+Druid can query segments that are only stored in deep storage. Running a query 
from deep storage is slower than running queries from segments that are loaded 
on Historical processes, but it's a great tool for data that you either access 
infrequently or where the low latency results that typical Druid queries 
provide is not necessary. Queries from deep storage can increase the surface 
area of data available to query without requiring you to scale your Historical 
processes to accommodate m [...]
+
+## Keep segments in deep storage only
+
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. However, to 
take advantage of the cost savings that querying from deep storage provides, 
make sure not all your segments get loaded onto Historical processes.
+
+To do this, configure [load 
rules](../operations/rule-configuration.md#load-rules) to manage the which 
segments are only in deep storage and which get loaded onto Historical 
processes.
+
+The easiest way to do this is to explicitly configure the segments that don't 
get loaded onto Historical processes. Set `tieredReplicants` to an empty array 
and `useDefaultTierForNull` to `false`. For example, if you configure the 
following rule for a datasource:
+
+```json
+[
+  {
+    "interval": "2016-06-27T00:00:00.000Z/2016-06-27T02:59:00.000Z",
+    "tieredReplicants": {},
+    "useDefaultTierForNull": false,
+    "type": "loadByInterval"
+  }
+]
+```
+
+Any segment that falls within the specified interval exists only in deep 
storage. For segments that aren't in this interval, they'll use the default 
cluster load rules or any other load rules you configure.
+
+To configure the load rules through the Druid console, go to **Datasources > 
... in the Actions column > Edit retention rules**. Then, paste the provided 
JSON into the JSON tab:
+
+![](../assets/tutorial-query-deepstorage-retention-rule.png)
+
+
+You can verify that a segment is not loaded on any Historical tiers by 
querying the Druid metadata table:
+
+```sql
+SELECT "segment_id", "replication_factor" FROM sys."segments" WHERE 
"replication_factor" = 0 AND "datasource" = YOUR_DATASOURCE
+```
+
+Segments with a `replication_factor` of `0` are not assigned to any Historical 
tiers. Queries against these segments are run directly against the segment in 
deep storage. 
+
+You can also confirm this through the Druid console. On the **Segments** page, 
see the **Replication factor** column.
+
+Keep the following in mind when working with load rules to control what exists 
only in deep storage:
+
+- At least one of the segments in a datasource must be loaded onto a 
Historical process so that Druid can plan the query. The segment on the 
Historical process can be any segment from the datasource. It does not need to 
be a specific segment. One way to verify that a datasource has at least one 
segment on a Historical process is if it's visible in the Druid console.
+- The actual number of replicas may differ from the replication factor 
temporarily as Druid processes your load rules.
+
+## Run a query from deep storage
+
+### Submit a query
+
+You can query data from deep storage by submitting a query to the API using 
`POST /sql/statements`  or the Druid console. Druid uses the multi-stage query 
(MSQ) task engine to perform the query.
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the query is contained in the "query" field in the JSON object 
within the request payload. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the request body fields are the same between the `sql` and 
`sql/statements` endpoints.
+
+There are additional context parameters for `sql/statements` specifically: 
+
+   - `executionMode`  (required) determines how query results are fetched. Set 
this to `ASYNC`. 
+   - `selectDestination` (optional) set to `durableStorage` instructs Druid to 
write the results from SELECT queries to durable storage. Note that this 
requires you to have [durable storage for MSQ 
enabled](../operations/durable-storage.md).
+
+The following sample query includes the two additional context parameters that 
querying from deep storage supports:
+
+```
+curl --location 'http://localhost:8888/druid/v2/sql/statements' \
+--header 'Content-Type: application/json' \
+--data '{
+    "query":"SELECT * FROM \"YOUR_DATASOURCE\" where \"__time\" 
>TIMESTAMP'\''2017-09-01'\'' and \"__time\" <= TIMESTAMP'\''2017-09-02'\''",
+    "context":{
+        "executionMode":"ASYNC",
+        "selectDestination": "durableStorage"
+
+    }  
+}'
+```
+
+The response for submitting a query includes the query ID along with basic 
information, such as when you submitted the query and the schema of the results:
+
+```json
+{
+  "queryId": "query-ALPHANUMBERIC-STRING",
+  "state": "ACCEPTED",
+  "createdAt": CREATION_TIMESTAMP,
+"schema": [
+  {
+    "name": COLUMN_NAME,
+    "type": COLUMN_TYPE,
+    "nativeType": COLUMN_TYPE
+  },
+  ...
+],
+"durationMs": DURATION_IN_MS,
+}
+```
+
+
+### Get query status
+
+You can check the status of a query with the following API call:
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/QUERYID
+```
+
+The query returns the status of the query, such as `ACCEPTED` or `RUNNING`. 
Before you attempt to get results, make sure the state is `SUCCESS`. 
+
+When you check the status on a successful query,  it includes useful 
information about your query results including a sample record and information 
about how the results are organized by `pages`. The information for each page 
includes the following:
+
+- `numRows`: the number of rows in that page of results
+- `sizeInBytes`: the size of the page
+- `id`: the indexed page number that you can use to reference a specific page 
when you get query results
+
+You can use `page` as a parameter to refine the results you retrieve. 
+
+The following snippet shows the structure of the `result` object:
+
+```json
+{
+  ...
+  "result": {
+    "numTotalRows": INTEGER,
+    "totalSizeInBytes": INTEGER,
+    "dataSource": "__query_select",
+    "sampleRecords": [
+      [
+        RECORD_1,
+        RECORD_2,
+        ...
+      ]
+    ],
+    "pages": [
+      {
+        "numRows": INTEGER,
+        "sizeInBytes": INTEGER,
+        "id": INTEGER_PAGE_NUMBER
+      }
+      ...
+    ]
+}
+}
+```
+
+### Get query results
+
+Only the user who submitted a query can retrieve the results for the query.
+
+Use the following endpoint to retrieve results:
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/QUERYID/results?page=PAGENUMBER&size=RESULT_SIZE&timeout=TIMEOUT_MS
+```
+
+Results are returned in JSON format.
+
+You can use the optional `page`, `size`, and `timeout` parameters to refine 
your results. You can retrieve the `page` information for your results by 
fetching the status of the completed query.
+
+When you try to get results for a query from deep storage, you may receive an 
error that states the query is still running. Wait until the query completes 
before you try again.
+
+## Further reading
+
+* [Query from deep storage 
tutorial](../tutorials/tutorial-query-deep-storage.md)
+* [Query from deep storage API 
reference](../api-reference/sql-api.md#query-from-deep-storage)
diff --git a/docs/tutorials/tutorial-query-deep-storage.md 
b/docs/tutorials/tutorial-query-deep-storage.md
new file mode 100644
index 0000000000..5502ad9422
--- /dev/null
+++ b/docs/tutorials/tutorial-query-deep-storage.md
@@ -0,0 +1,293 @@
+---
+id: tutorial-query-deep-storage
+title: "Tutorial: Query from deep storage"
+sidebar_label: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+> Query from deep storage is an [experimental 
feature](../development/experimental.md).
+
+Query from deep storage allows you to query segments that are stored only in 
deep storage, which provides lower costs than if you were to load everything 
onto Historical processes. The tradeoff is that queries from deep storage may 
take longer to complete. 
+
+This tutorial walks you through loading example data, configuring load rules 
so that not all the segments get loaded onto Historical processes, and querying 
data from deep storage.
+
+To run the queries in this tutorial, replace `ROUTER:PORT` with the location 
of the Router process and its port number. For example, use `localhost:8888` 
for the quickstart deployment.
+
+For more general information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+## Load example data
+
+Use the **Load data** wizard or the following SQL query to ingest the 
`wikipedia` sample datasource bundled with Druid. If you use the wizard, make 
sure you change the partitioning to be by hour.
+
+Partitioning by hour provides more segment granularity, so you can selectively 
load segments onto Historicals or keep them in deep storage.
+
+<details><summary>Show the query</summary>
+
+```sql
+REPLACE INTO "wikipedia" OVERWRITE ALL
+WITH "ext" AS (SELECT *
+FROM TABLE(
+  EXTERN(
+    
'{"type":"http","uris":["https://druid.apache.org/data/wikipedia.json.gz"]}',
+    '{"type":"json"}'
+  )
+) EXTEND ("isRobot" VARCHAR, "channel" VARCHAR, "timestamp" VARCHAR, "flags" 
VARCHAR, "isUnpatrolled" VARCHAR, "page" VARCHAR, "diffUrl" VARCHAR, "added" 
BIGINT, "comment" VARCHAR, "commentLength" BIGINT, "isNew" VARCHAR, "isMinor" 
VARCHAR, "delta" BIGINT, "isAnonymous" VARCHAR, "user" VARCHAR, "deltaBucket" 
BIGINT, "deleted" BIGINT, "namespace" VARCHAR, "cityName" VARCHAR, 
"countryName" VARCHAR, "regionIsoCode" VARCHAR, "metroCode" BIGINT, 
"countryIsoCode" VARCHAR, "regionName" VARCHAR))
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "isRobot",
+  "channel",
+  "flags",
+  "isUnpatrolled",
+  "page",
+  "diffUrl",
+  "added",
+  "comment",
+  "commentLength",
+  "isNew",
+  "isMinor",
+  "delta",
+  "isAnonymous",
+  "user",
+  "deltaBucket",
+  "deleted",
+  "namespace",
+  "cityName",
+  "countryName",
+  "regionIsoCode",
+  "metroCode",
+  "countryIsoCode",
+  "regionName"
+FROM "ext"
+PARTITIONED BY HOUR
+```
+
+</details>
+
+## Configure a load rule
+
+The load rule configures Druid to keep any segments that fall within the 
following interval only in deep storage:
+
+```
+2016-06-27T00:00:00.000Z/2016-06-27T02:59:00.000Z
+```
+
+The JSON form of the rule is as follows:
+
+```json
+[
+  {
+    "interval": "2016-06-27T00:00:00.000Z/2016-06-27T02:59:00.000Z",
+    "tieredReplicants": {},
+    "useDefaultTierForNull": false,
+    "type": "loadByInterval"
+  }
+]
+```
+
+The rest of the segments use the default load rules for the cluster. For the 
quickstart, that means all the other segments get loaded onto Historical 
processes.
+
+You can configure the load rules through the API or the Druid console. To 
configure the load rules through the Druid console, go to **Datasources > ... 
in the Actions column > Edit retention rules**. Then, paste the provided JSON 
into the JSON tab:
+
+![](../assets/tutorial-query-deepstorage-retention-rule.png)
+
+
+### Verify the replication factor
+
+Segments that are only available from deep storage have a `replication_factor` 
of 0 in the Druid system table. You can verify that your load rule worked as 
intended using the following query:
+
+```sql
+SELECT "segment_id", "replication_factor", "num_replicas"  FROM sys."segments" 
WHERE datasource = 'wikipedia'
+```
+
+You can also verify it through the Druid console by checking the **Replication 
factor** column in the **Segments** view.
+
+Note that the number of replicas and replication factor may differ temporarily 
as Druid processes your retention rules.
+
+## Query from deep storage
+
+Now that there are segments that are only available from deep storage, run the 
following query:
+
+```sql
+SELECT page FROM wikipedia WHERE __time <  TIMESTAMP'2016-06-27 00:10:00' 
LIMIT 10
+```
+
+With the context parameter:
+
+```json
+"executionMode": "ASYNC"
+```
+
+For example, run the following curl command:
+
+```
+curl --location 'http://localhost:8888/druid/v2/sql/statements' \
+--header 'Content-Type: application/json' \
+--data '{
+    "query":"SELECT page FROM wikipedia WHERE __time <  
TIMESTAMP'\''2016-06-27 00:10:00'\'' LIMIT 10",
+    "context":{
+        "executionMode":"ASYNC"
+    }  
+}'
+```
+
+This query looks for records with timestamps that precede `00:10:00`. Based on 
the load rule you configured earlier, this data is only available from deep 
storage.
+
+When you submit the query from deep storage through the API, you get the 
following response:
+
+<details><summary>Show the response</summary>
+
+```json
+{
+    "queryId": "query-6888b6f6-e597-456c-9004-222b05b97051",
+    "state": "ACCEPTED",
+    "createdAt": "2023-07-28T21:59:02.334Z",
+    "schema": [
+        {
+            "name": "page",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        }
+    ],
+    "durationMs": -1
+}
+```
+
+Make sure you note the `queryID`. You'll need it to interact with the query.
+
+</details>
+
+Compare this to if you were to submit the query to Druid SQL's regular 
endpoint, `POST /sql`: 
+
+```
+curl --location 'http://localhost:8888/druid/v2/sql/' \
+--header 'Content-Type: application/json' \
+--data '{
+    "query":"SELECT page FROM wikipedia WHERE __time <  
TIMESTAMP'\''2016-06-27 00:10:00'\'' LIMIT 10",
+    "context":{
+        "executionMode":"ASYNC"
+    }  
+}'
+```
+
+The response you get back is an empty response cause there are no records on 
the Historicals that match the query.
+
+## Get query status
+
+Replace `:queryId` with the ID for your query and run the following curl 
command to get your query status:
+
+```
+curl --location --request GET 
'http://localhost:8888/druid/v2/sql/statements/:queryId' \
+--header 'Content-Type: application/json' \
+```
+
+
+### Response for a running query
+
+The response for a running query is the same as the response from when you 
submitted the query except the `state` is `RUNNING` instead of `ACCEPTED`.
+
+### Response for a completed query
+
+A successful query also returns a `pages` object that includes the page 
numbers (`id`), rows per page (`numRows`), and the size of the page 
(`sizeInBytes`). You can pass the page number as a parameter when you get 
results to refine the results you get.
+
+Note that `sampleRecords` has been truncated for brevity.
+
+<details><summary>Show the response</summary>
+
+```json
+{
+    "queryId": "query-6888b6f6-e597-456c-9004-222b05b97051",
+    "state": "SUCCESS",
+    "createdAt": "2023-07-28T21:59:02.334Z",
+    "schema": [
+        {
+            "name": "page",
+            "type": "VARCHAR",
+            "nativeType": "STRING"
+        }
+    ],
+    "durationMs": 87351,
+    "result": {
+        "numTotalRows": 152,
+        "totalSizeInBytes": 9036,
+        "dataSource": "__query_select",
+        "sampleRecords": [
+            [
+                "Salo Toraut"
+            ],
+            [
+                "利用者:ワーナー成増/放送ウーマン賞"
+            ],
+            [
+                "Bailando 2015"
+            ],
+            ...
+            ...
+            ...
+        ],
+        "pages": [
+            {
+                "id": 0,
+                "numRows": 152,
+                "sizeInBytes": 9036
+            }
+        ]
+    }
+}
+```
+
+</details>
+
+## Get query results
+
+Replace `:queryId` with the ID for your query and run the following curl 
command to get your query results:
+
+```
+curl --location 'http://ROUTER:PORT/druid/v2/sql/statements/:queryId'
+```
+
+Note that the response has been truncated for brevity.
+
+<details><summary>Show the response</summary>
+
+```json
+[
+    {
+        "page": "Salo Toraut"
+    },
+    {
+        "page": "利用者:ワーナー成増/放送ウーマン賞"
+    },
+    {
+        "page": "Bailando 2015"
+    },
+    ...
+    ...
+    ...
+]
+```
+
+</details>
+
+## Further reading
+
+* [Query from deep storage](../querying/query-from-deep-storage.md)
+* [Query from deep storage API 
reference](../api-reference/sql-api.md#query-from-deep-storage)
\ No newline at end of file
diff --git a/website/.spelling b/website/.spelling
index 5b12d9b3ea..3ea9178552 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -504,6 +504,7 @@ supervisorTaskId
 SVG
 symlink
 syntaxes
+TabItem
 tiering
 timeseries
 Timeseries
diff --git a/website/sidebars.json b/website/sidebars.json
index 458d2bfe03..0dbb95a44b 100644
--- a/website/sidebars.json
+++ b/website/sidebars.json
@@ -24,6 +24,7 @@
       "tutorials/tutorial-kerberos-hadoop",
       "tutorials/tutorial-sql-query-view",
       "tutorials/tutorial-unnest-arrays",
+      "tutorials/tutorial-query-deep-storage",
       "tutorials/tutorial-jupyter-index",
       "tutorials/tutorial-jupyter-docker",
       "tutorials/tutorial-jdbc"
@@ -98,6 +99,7 @@
         "label": "Druid SQL",
         "ids": [
           "querying/sql",
+          "querying/query-deep-storage",
           "querying/sql-data-types",
           "querying/sql-operators",
           "querying/sql-scalar",
@@ -200,6 +202,7 @@
     "Operations": [
       "operations/web-console",
       "operations/java",
+      "operations/durable-storage",
       {
         "type": "subcategory",
         "label": "Security",


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch 27.0.0 updated: docs: query from deep storage (#14609) (#14754)

Reply via email to