gianm commented on code in PR #18252:
URL: https://github.com/apache/druid/pull/18252#discussion_r2240528323


##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.

Review Comment:
   This intro should come from a different direction- Dart isn't meant as an 
alternative to MSQ tasks, it's meant as an alternative to the native engine. 
It's meant for situations where the native engine performs poorly because of 
insufficient parallelism, such as:
   
   - large joins (which Dart can do with a parallel sort-merge)
   - high-cardinality exact group-bys
   - high-cardinality exact count distinct
   
   In these situations, Dart can parallelize throughout the entire query, which 
leads to better performance.
   
   The introduction should also explain how Dart works. Briefly, it's a profile 
of MSQ that runs `SELECT` queries on Brokers and Historicals, rather than on 
tasks. Brokers act as controllers and Historicals act as workers.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.

Review Comment:
   This should be stronger than "Ideally"; see the above comment on 
`druid.msq.dart.controller.concurrentQueries`.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",
+  ...
+  ...
+}'
+  ```
+
+  </TabItem>
+  <TabItem value="context_block" label="Context block">
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "query":   "SELECT\n  user,\n  commentLength,COUNT(*) AS \"COUNT\" FROM 
wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC",
+
+  ...
+  ...
+  "context": {
+    "engine":"msq-dart"
+    ...
+  }
+  }'
+  ```
+
+  </TabItem>
+  </Tabs>
+
+Dart supports many of the same [query context parameters as the MSQ task 
engine](../multi-stage-query/reference.md#context-parameters).
+
+  ## Known issues and limitations
+
+  - If you encounter an issue where Dart can't find a segment, try rerunning 
your query. 
+  - If your data includes HLL Sketches for realtime data, Dart returns a 
`NullPointerException`.

Review Comment:
   Does this really happen? If so we should raise a github issue with more 
details.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.

Review Comment:
   Important: the total `druid.msq.dart.controller.concurrentQueries` across 
all Brokers must be less than `druid.msq.dart.worker.concurrentQueries` on any 
one Historical, or else queries can potentially get stuck waiting for each 
other. The experimental version of Dart does not verify this for you, so it's 
important for admins to double-check it.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:

Review Comment:
   Let's recommend adding all of these configs to 
`_common/common.runtime.properties`. Only the Broker and Historical look at 
them, but it's easier to have them in one place.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,97 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.

Review Comment:
   IMO this is a bit too strong, I'd reword as:
   
   > Dart is experimental. For production use cases that require a 
battle-tested query engine, we recommend the default `native` query engine.
   
   I say this because it's OK to use Dart in production if it's better than 
native for your use case. You should just be aware that it hasn't received as 
much testing, and use some caution.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.

Review Comment:
   suggestion: "number of available threads on workers 
(`druid.processing.numThreads`)"
   
   Mention also that the default is 1, i.e. no multithreading on Historicals.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",
+  ...
+  ...
+}'
+  ```
+
+  </TabItem>
+  <TabItem value="context_block" label="Context block">
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "query":   "SELECT\n  user,\n  commentLength,COUNT(*) AS \"COUNT\" FROM 
wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC",
+
+  ...
+  ...
+  "context": {
+    "engine":"msq-dart"
+    ...
+  }
+  }'
+  ```
+
+  </TabItem>
+  </Tabs>
+
+Dart supports many of the same [query context parameters as the MSQ task 
engine](../multi-stage-query/reference.md#context-parameters).
+
+  ## Known issues and limitations

Review Comment:
   Some current known issues and limitations that come to mind for me:
   
   - Dart doesn't verify that `druid.msq.dart.controller.concurrentQueries` is 
set properly, that's up to the admin. If set too high then queries can get 
stuck on each other.
   - Dart does not use the query cache.
   - Dart does not implement query prioritization or lanes.
   - Dart (like MSQ in general) does not implement `useApproximateTopN`.
   - Dart cannot be used with JDBC. The `engine` parameter is ignored.
   - https://github.com/apache/druid/pull/18336 can in some cases lead to 
`NoClassDefFoundError` for `NilStageOutputReader`



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",
+  ...
+  ...
+}'
+  ```
+
+  </TabItem>
+  <TabItem value="context_block" label="Context block">
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "query":   "SELECT\n  user,\n  commentLength,COUNT(*) AS \"COUNT\" FROM 
wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC",
+
+  ...
+  ...
+  "context": {
+    "engine":"msq-dart"
+    ...
+  }
+  }'
+  ```
+
+  </TabItem>
+  </Tabs>
+
+Dart supports many of the same [query context parameters as the MSQ task 
engine](../multi-stage-query/reference.md#context-parameters).

Review Comment:
   See above comment; we should list them all comprehensively so people don't 
have to guess.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",
+  ...
+  ...
+}'
+  ```
+
+  </TabItem>
+  <TabItem value="context_block" label="Context block">
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+  --header 'Content-Type: application/json' \

Review Comment:
   Same with this example, better if it's valid.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",

Review Comment:
   It would be better if this example used valid JSON. You can just include 
`"query"` by itself.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,97 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.

Review Comment:
   It's `includeSegmentSource`. It's mentioned in the "Context parameters" 
section of `docs/multi-stage-query/reference.md`, which should be replicated 
here with edits that make sense for Dart. In particular:
   
   - remove parameters that Dart doesn't use: `maxNumTasks`, `taskAssignment`, 
`maxParseExceptions`, `durableShuffleStorage`, `faultTolerance`, 
`selectDestination`, `rowsPerPage`, and anything not labeled `SELECT` (Dart 
doesn't do `INSERT` or `REPLACE`)
   - add Dart-specific parameters `maxConcurrentStages`, 
`targetPartitionsPerWorker`, `maxNonLeafWorkers`
   - update the default for `includeSegmentSource` to `REALTIME`
   
   Here's what the Dart-specific parameters do:
   
   - `maxConcurrentStages` is the number of stages that can run concurrently 
for a query. Default is 2. Higher numbers can potentially improve pipelining, 
but also mean less memory is available for each stage.
   - `targetPartitionsPerWorker` is the number of partitions we generate for 
each worker. It controls how much parallelism can be maintained throughout the 
query. Default is 1.
   - `maxNonLeafWorkers` is the number of workers to use for stages beyond the 
leaf stage. Default is 1, which is scatter-gather style.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",
+  ...
+  ...
+}'
+  ```
+
+  </TabItem>
+  <TabItem value="context_block" label="Context block">
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "query":   "SELECT\n  user,\n  commentLength,COUNT(*) AS \"COUNT\" FROM 
wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC",
+
+  ...
+  ...
+  "context": {
+    "engine":"msq-dart"
+    ...
+  }
+  }'
+  ```
+
+  </TabItem>
+  </Tabs>
+
+Dart supports many of the same [query context parameters as the MSQ task 
engine](../multi-stage-query/reference.md#context-parameters).
+
+  ## Known issues and limitations
+
+  - If you encounter an issue where Dart can't find a segment, try rerunning 
your query. 
+  - If your data includes HLL Sketches for realtime data, Dart returns a 
`NullPointerException`.
+  - When a Dart query fails on a Historical with an error about no workers 
running for a query, it gets stuck retrying the query. If the query doesn't get 
canceled, it can cause other queries to fail.

Review Comment:
   I think this was fixed by #17277, so let's remove it.



##########
docs/querying/dart.md:
##########
@@ -0,0 +1,116 @@
+---
+id: dart
+title: "SQL queries using the Dart query engine"
+sidebar_label: "Dart query engine"
+description: Use the Dart query engine for light-weight queries that don't 
need all the capabilities of the MSQ task engine.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ License); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info[Experimental]
+
+Dart is experimental. For production use, we recommend using the other 
available query engines.
+
+:::
+
+
+Use the Dart query engine for light-weight queries that don't need all the 
capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries 
that have intermediate results consisting of hundreds of millions of rows. In 
this case, the Dart engine's multi-threaded workers perform in-memory shuffles 
using locally cached data without pulling from deep storage.
+
+You can query batch or realtime datasources with Dart.
+
+## Enable Dart
+
+To enable Dart, add the following line to your `broker/runtime.properties` and 
`historical/runtime.properties` files:
+
+```
+druid.msq.dart.enabled = true
+```
+
+### Configure resource consumption
+
+You can configure the Broker and the Historical to tune Dart's resource 
consumption.
+
+For Brokers, you can set the following configs:
+
+- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query 
controllers that can run concurrently on that Broker. Additional controllers 
are queued. Defaults to 1.
+- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of 
partitions per worker to create during a shuffle. Set this to the number of 
available threads on workers to fully take advantage of multi-threaded 
processing of shuffled data.
+
+For Historicals, you can set the following configs:
+
+- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query 
workers that can run concurrently on that Historical. Default is equal to the 
number of merge buffers because each query needs one merge buffer. Ideally, 
this should be equal to or larger than the sum of the `concurrentQueries` 
setting on your Brokers.
+- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available 
for use across all Dart queries as a decimal. The default is 0.35, 35% of heap.
+
+
+## Run a Dart query
+
+Once enabled, you can use Dart in the Druid console or the SQL query API to 
issue queries.
+
+### Druid console
+
+In the **Query** view, select **Engine: SQL (Dart)** from the engine selector 
menu.
+
+### API
+
+Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query 
context parameter `engine` and set it to `msq-dart`:
+
+<Tabs>
+  <TabItem value="SET" label="SET" default>
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query":   "SET engine = 'msq-dart';\nSELECT\n  user,\n  
commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 
DESC",
+  ...
+  ...
+}'
+  ```
+
+  </TabItem>
+  <TabItem value="context_block" label="Context block">
+    
+  ```sql
+  curl --location 'http://HOST:PORT/druid/v2/sql' \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "query":   "SELECT\n  user,\n  commentLength,COUNT(*) AS \"COUNT\" FROM 
wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC",
+
+  ...
+  ...
+  "context": {
+    "engine":"msq-dart"
+    ...
+  }
+  }'
+  ```
+
+  </TabItem>
+  </Tabs>
+
+Dart supports many of the same [query context parameters as the MSQ task 
engine](../multi-stage-query/reference.md#context-parameters).
+
+  ## Known issues and limitations
+
+  - If you encounter an issue where Dart can't find a segment, try rerunning 
your query. 

Review Comment:
   I think we fixed this one in #18291, so let's remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to