vogievetsky commented on code in PR #12983:
URL: https://github.com/apache/druid/pull/12983#discussion_r958966215
##########
docs/multi-stage-query/msq-api.md:
##########
@@ -0,0 +1,1634 @@
+---
+id: api
+title: SQL-based ingestion APIs
+sidebar_label: API
+---
+
+> SQL-based ingestion using the multi-stage query task engine is our
recommended solution starting in Druid 24.0. Alternative ingestion solutions,
such as native batch and Hadoop-based ingestion systems, will still be
supported. We recommend you read all [known issues](./msq-known-issues.md) and
test the feature in a development environment before rolling it out in
production. Using the multi-stage query task engine with `SELECT` statements
that do not write to a datasource is experimental.
+
+The **Query** view in the Druid console provides the most stable experience
for the multi-stage query task engine (MSQ task engine) and multi-stage query
architecture. Use the UI if you do not need a programmatic interface.
+
+When using the API for the MSQ task engine, the action you want to take
determines the endpoint you use:
+
+- `/druid/v2/sql/task` endpoint: Submit a query for ingestion.
+- `/druid/indexer/v1/task` endpoint: Interact with a query, including getting
its status, getting its details, or canceling it. This page describes a few of
the Overlord Task APIs that you can use with the MSQ task engine. For
information about Druid APIs, see the [API reference for
Druid](../operations/api-reference.md#tasks).
+
+## Submit a query
+
+You submit queries to the MSQ task engine using the `POST /druid/v2/sql/task/`
endpoint.
+
+### Request
+
+Currently, the MSQ task engine ignores the provided values of `resultFormat`,
`header`,
+`typesHeader`, and `sqlTypesHeader`. SQL SELECT queries always behave as if
`resultFormat` is an array, `header` is
+true, `typesHeader` is true, and `sqlTypesHeader` is true.
+
+For task queries similar to the [example queries](./msq-example-queries.md),
you need to escape characters such as quotation marks (") if you use something
like `curl`.
+You don't need to escape characters if you use a method that can parse JSON
seamlessly, such as Python.
+The Python example in this topic escapes quotation marks although it's not
required.
+
+The following example is the same query that you submit when you complete
[Convert a JSON ingestion spec](./msq-tutorial-convert-ingest-spec.md) where
you insert data into a table named `wikipedia`.
+
+<!--DOCUSAURUS_CODE_TABS-->
+
+<!--HTTP-->
+
+```
+POST /druid/v2/sql/task
+```
+
+```json
+{
+ "query": "INSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS
__time,\n *\nFROM TABLE(\n EXTERN(\n '{\"type\": \"http\", \"uris\":
[\"https://static.imply.io/data/wikipedia.json.gz\"]}',\n '{\"type\":
\"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\":
\"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\",
\"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"},
{\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\":
\"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\":
\"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\":
\"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\":
\"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\":
\"string\"}, {\"name\": \"isRobot\", \"type\": \"string
\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\":
\"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\":
\"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\":
\"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\":
\"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\":
\"user\", \"type\": \"string\"}]'\n )\n)\nPARTITIONED BY DAY",
+ "context": {
+ "maxNumTasks": 3
+ }
+}
+```
+
+<!--curl-->
+
+Make sure you replace `username`, `password`, `your-instance`, and `port` with
the values for your deployment.
+
+```bash
+curl --location --request POST
'https://<username>:<password>@<your-instance>:<port>/druid/v2/sql/task/' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+ "query": "INSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS
__time,\n *\nFROM TABLE(\n EXTERN(\n '\''{\"type\": \"http\", \"uris\":
[\"https://static.imply.io/data/wikipedia.json.gz\"]}'\'',\n '\''{\"type\":
\"json\"}'\'',\n '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\":
\"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\",
\"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"},
{\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\":
\"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\":
\"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\":
\"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\":
\"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\":
\"string\"}, {\"name\": \"isRobot\", \
"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"},
{\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\",
\"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\":
\"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\":
\"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\":
\"user\", \"type\": \"string\"}]'\''\n )\n)\nPARTITIONED BY DAY",
+ "context": {
+ "maxNumTasks": 3
+ }
+```
+
+<!--Python-->
+Make sure you replace `username`, `password`, `your-instance`, and `port` with
the values for your deployment.
+
+```python
+import json
+import requests
+
+url = "https://<username>:<password>@<your-instance>:<port>/druid/v2/sql/task/"
+
+payload = json.dumps({
+ "query": "INSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS
__time,\n *\nFROM TABLE(\n EXTERN(\n '{\"type\": \"http\", \"uris\":
[\"https://static.imply.io/data/wikipedia.json.gz\"]}',\n '{\"type\":
\"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\":
\"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\",
\"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"},
{\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\":
\"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\":
\"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\":
\"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\":
\"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\":
\"string\"}, {\"name\": \"isRobot\", \"type\": \"string
\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\":
\"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\":
\"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\":
\"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\":
\"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\":
\"user\", \"type\": \"string\"}]'\n )\n)\nPARTITIONED BY DAY",
+ "context": {
+ "maxNumTasks": 3
+ }
+})
+headers = {
+ 'Content-Type': 'application/json'
+}
+
+response = requests.request("POST", url, headers=headers, data=payload)
+
+print(response.text)
+
+```
+
+<!--END_DOCUSAURUS_CODE_TABS-->
+
+
+### Response
+
+```json
+{
+ "taskId": "query-f795a235-4dc7-4fef-abac-3ae3f9686b79",
+ "state": "RUNNING",
+}
+```
+
+**Response fields**
+
+|Field|Description|
+|-----|-----------|
+ | taskId | Controller task ID. You can use Druid's standard [task
APIs](../operations/api-reference.md#overlord) to interact with this controller
task.|
Review Comment:
is there a strange indent here or is the GitHub UI buggy
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]