[GitHub] [druid] gianm commented on a diff in pull request #13365: Druid automated quickstart

GitBox Tue, 29 Nov 2022 23:14:10 -0800


gianm commented on code in PR #13365:
URL: https://github.com/apache/druid/pull/13365#discussion_r1035054872



##########
docs/operations/single-server.md:
##########
@@ -42,37 +43,43 @@ The startup scripts for these example configurations run a 
single ZK instance al
 
 The example configurations run the Druid Coordinator and Overlord together in 
a single process using the optional configuration 
`druid.coordinator.asOverlord.enabled=true`, described in the [Coordinator 
configuration documentation](../configuration/index.md#coordinator-operation).
 
+The `start-druid` is a generic launch script for starting druid services on 
single server, it accepts optional arguments like services, memory and config.

Review Comment:
   This should mention how to see what the arguments are. I suggest including 
some sentence like:
   
   > For details about possible arguments, run `bin/start-druid --help`.



##########
docs/operations/single-server.md:
##########
@@ -31,6 +31,7 @@ Druid includes a set of reference configurations and launch 
scripts for single-m
 - `medium`
 - `large`
 - `xlarge`
+- `start-druid`

Review Comment:
   Hmm. Reading this makes me realize an inconsistency in the naming of things. 
The names here are the names of the config directories. The star scripts are 
_mostly_ aligned with the config directories, but not quite; we have 
`start-micro-quickstart` and `start-nano-quickstart` for the smaller ones, but 
`start-single-server-small`, `start-single-server-medium`, etc, for the bigger 
ones.
   
   I think we ought to clean this up. I'm open to ideas but here is a 
suggestion, where we include both:
   
   1. Rename `conf/druid/single-server/quickstart` to 
`conf/druid/single-server/auto`.
   2. Use the following text:
   
   ```
   Druid includes a set of reference configurations and launch scripts for 
single-machine deployments.
   These configuration bundles are located in `conf/druid/single-server/`.
   
   The `auto` configuration sizes runtime parameters based on available 
processors and memory. Other configurations include hard-coded runtime 
parameters for various server sizes. Most users should stick with `auto`.
   
   - `auto` (run script: `bin/start-druid`)
   - `nano-quickstart` (run script: `bin/start-nano-quickstart`)
   - `micro-quickstart` (run script: `bin/start-micro-quickstart`)
   - `small` (run script: `bin/start-single-server-small`)
   - `medium` (run script: `bin/start-single-server-medium`)
   - `large` (run script: `bin/start-single-server-large`)
   - `xlarge` (run script: `bin/start-single-server-xlarge`)
   ```



##########
docs/tutorials/index.md:
##########
@@ -72,38 +73,39 @@ The distribution directory contains `LICENSE` and `NOTICE` 
files and subdirector
 
 ## Start up Druid services
 
-Start up Druid services using the `micro-quickstart` single-machine 
configuration.
+Start up Druid services using the `micro` single-machine configuration.
 This configuration includes default settings that are appropriate for this 
tutorial, such as loading the `druid-multi-stage-query` extension by default so 
that you can use the MSQ task engine.
 
-You can view that setting and others in the configuration files in the 
`conf/druid/single-server/micro-quickstart/`. 
+You can view that setting and others in the configuration files in the 
`conf/druid/single-server/quickstart/`. 
 
 From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
 
 ```bash
-./bin/start-micro-quickstart
+./bin/start-druid --memory=16g

Review Comment:
   Let's remove `--memory=16g` and allow it to be automatically sized.



##########
docs/tutorials/index.md:
##########
@@ -72,38 +73,39 @@ The distribution directory contains `LICENSE` and `NOTICE` 
files and subdirector
 
 ## Start up Druid services
 
-Start up Druid services using the `micro-quickstart` single-machine 
configuration.
+Start up Druid services using the `micro` single-machine configuration.
 This configuration includes default settings that are appropriate for this 
tutorial, such as loading the `druid-multi-stage-query` extension by default so 
that you can use the MSQ task engine.
 
-You can view that setting and others in the configuration files in the 
`conf/druid/single-server/micro-quickstart/`. 
+You can view that setting and others in the configuration files in the 
`conf/druid/single-server/quickstart/`. 
 
 From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
 
 ```bash
-./bin/start-micro-quickstart
+./bin/start-druid --memory=16g
 ```
 
 This brings up instances of ZooKeeper and the Druid services:
 
 ```bash
 $ ./bin/start-micro-quickstart

Review Comment:
   `./bin/start-druid`



##########
docs/tutorials/tutorial-batch-hadoop.md:
##########
@@ -28,7 +28,7 @@ This tutorial shows you how to load data files into Apache 
Druid using a remote
 
 For this tutorial, we'll assume that you've already completed the previous
 [batch ingestion tutorial](tutorial-batch.md) using Druid's native batch 
ingestion system and are using the
-`micro-quickstart` single-machine configuration as described in the 
[quickstart](index.md).
+`micro` single-machine configuration as described in the 
[quickstart](../operations/single-server.md#micro-4-cpu-16gib-ram).

Review Comment:
   `auto`, not `micro`



##########
docs/tutorials/index.md:
##########
@@ -23,7 +23,7 @@ title: "Quickstart (local)"
   -->
 
 
-This quickstart gets you started with Apache Druid using the 
[`micro-quickstart`](../operations/single-server.md#micro-quickstart-4-cpu-16gib-ram)
 configuration, and introduces you to Druid ingestion and query features.
+This quickstart gets you started with Apache Druid using the 
[`micro`](../operations/single-server.md#micro-4-cpu-16gib-ram) configuration, 
and introduces you to Druid ingestion and query features.

Review Comment:
   Ah, I meant to suggest updating the tutorial to use `bin/start-druid` rather 
than the micro config. I don't think we need to mention the size of the machine 
at all here, since it's automatic now. Something like this is good:
   
   ```
   This quickstart gets you started with Apache Druid and introduces you to 
Druid ingestion and query features.
   ```



##########
examples/bin/supervise:
##########
@@ -179,13 +195,28 @@ usage() unless GetOptions(
   'vardir|d=s',
   'kill-timeout|t=i',
   'chdir=s',
-  'svlogd:s'
+  'svlogd:s',
+  'array|a=s{,}'

Review Comment:
   I agree that calling this `commands` makes more sense. It's OK to use the 
name of a declared var here, since it would be accessed like `$opt{'commands'}` 
not `@commands`. It's different enough that it's not ambiguous.
   
   You can also fix the funky `,` thing by using `s@`, which writes to a string 
array.
   
   So:
   
   - Instead of `array|a=s{,}`, try `command=s@`.
   - Generate it in python as `["--command", "<line1>", "--command", "<line2>", 
"--command", "<line3>"]` (the `--command` is repeated each time).
   - Read it in perl using `@config_lines = @{${'commands'}}`. There won't be a 
need for `split` or for using `;` to mean `,`.



##########
docs/operations/single-server.md:
##########
@@ -42,37 +43,43 @@ The startup scripts for these example configurations run a 
single ZK instance al
 
 The example configurations run the Druid Coordinator and Overlord together in 
a single process using the optional configuration 
`druid.coordinator.asOverlord.enabled=true`, described in the [Coordinator 
configuration documentation](../configuration/index.md#coordinator-operation).
 
+The `start-druid` is a generic launch script for starting druid services on 
single server, it accepts optional arguments like services, memory and config.
+All reference configurations can be acheived by passing appropriate arguments 
to this script.
+Existing launch scripts are deprecated and will be removed in the next 
release. 
+
+
 While example configurations are provided for very large single machines, at 
higher scales we recommend running Druid in a [clustered 
deployment](../tutorials/cluster.md), for fault-tolerance and reduced resource 
contention.
 
 ## Single server reference configurations
 
-### Nano-Quickstart: 1 CPU, 4GiB RAM
+### Nano: 1 CPU, 4GiB RAM
 
-- Launch command: `bin/start-nano-quickstart`

Review Comment:
   Please revert these changes -- this part of the doc is meant to describe the 
nano, micro, etc, bundled configs, which have their own run commands and config 
directories. We may change that at some point but we haven't yet.



##########
docs/tutorials/index.md:
##########
@@ -37,7 +37,8 @@ Druid supports a variety of ingestion options. Once you're 
done with this tutori
 
 You can follow these steps on a relatively modest machine, such as a 
workstation or virtual server with 16 GiB of RAM.
 
-Druid comes equipped with several [startup configuration 
profiles](../operations/single-server.md) for a
+Druid comes equipped with a single launch script that can be used to run 
several 
+[startup configuration profiles](../operations/single-server.md) for a

Review Comment:
   Here, I'd mention the automatic one first first. Something like:
   
   > Druid comes equipped with launch scripts that can be used to start all 
processes on a single server. Here, we will use `bin/start-druid`, which 
automatically sets various runtime properties based on available processors and 
memory.
   >
   > In addition, Druid includes several `[bundled non-automatic 
profiles](../operations/single-server.md)` for a range of machine sizes. These 
range from `nano` (1 CPU, 4GiB RAM) to `x-large` (64 CPU, 512GiB RAM). We won't 
use those here, but for more information, see `[Single server 
deployment](../operations/single-server.md)`. For additional information on 
deploying Druid services across clustered machines, see `[Clustered 
deployment](./cluster.md)`.



##########
docs/tutorials/index.md:
##########
@@ -72,38 +73,39 @@ The distribution directory contains `LICENSE` and `NOTICE` 
files and subdirector
 
 ## Start up Druid services
 
-Start up Druid services using the `micro-quickstart` single-machine 
configuration.
+Start up Druid services using the `micro` single-machine configuration.

Review Comment:
   `auto`, not `micro`



##########
docs/operations/single-server.md:
##########
@@ -42,37 +43,43 @@ The startup scripts for these example configurations run a 
single ZK instance al
 
 The example configurations run the Druid Coordinator and Overlord together in 
a single process using the optional configuration 
`druid.coordinator.asOverlord.enabled=true`, described in the [Coordinator 
configuration documentation](../configuration/index.md#coordinator-operation).
 
+The `start-druid` is a generic launch script for starting druid services on 
single server, it accepts optional arguments like services, memory and config.

Review Comment:
   It should also mention something about how the computations are done. (All 
available processors, 80% of memory, etc.)



##########
docs/tutorials/tutorial-kafka.md:
##########
@@ -30,7 +30,7 @@ The tutorial guides you through the steps to load sample 
nested clickstream data
 
 ## Prerequisites
 
-Before you follow the steps in this tutorial, download Druid as described in 
the [quickstart](index.md) using the 
[micro-quickstart](../operations/single-server.md#micro-quickstart-4-cpu-16gib-ram)
 single-machine configuration and have it running on your local machine. You 
don't need to have loaded any data.
+Before you follow the steps in this tutorial, download Druid as described in 
the [quickstart](index.md) using the 
[micro](../operations/single-server.md#micro-4-cpu-16gib-ram) single-machine 
configuration and have it running on your local machine. You don't need to have 
loaded any data.

Review Comment:
   `auto`, not `micro`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on a diff in pull request #13365: Druid automated quickstart

Reply via email to