(druid) branch master updated: docs: Refresh docs for SQL input source (#17031)

victoria Mon, 16 Sep 2024 15:52:48 -0700

This is an automated email from the ASF dual-hosted git repository.

victoria pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 2e2f3cf66a2 docs: Refresh docs for SQL input source (#17031)
2e2f3cf66a2 is described below

commit 2e2f3cf66a2e1218fb285de0981b679e9cc30878
Author: Victoria Lim <[email protected]>
AuthorDate: Mon Sep 16 15:52:37 2024 -0700

    docs: Refresh docs for SQL input source (#17031)
    
    Co-authored-by: Charles Smith <[email protected]>
---
 docs/development/extensions-core/druid-lookups.md |  4 +-
 docs/development/extensions-core/mysql.md         | 73 ++++++++++++++--------
 docs/development/extensions-core/postgresql.md    | 13 ++--
 docs/ingestion/input-sources.md                   | 74 +++++++++++------------
 docs/querying/lookups-cached-global.md            |  2 +-
 5 files changed, 95 insertions(+), 71 deletions(-)

diff --git a/docs/development/extensions-core/druid-lookups.md 
b/docs/development/extensions-core/druid-lookups.md
index d6219b8c742..06283ec4d72 100644
--- a/docs/development/extensions-core/druid-lookups.md
+++ b/docs/development/extensions-core/druid-lookups.md
@@ -31,9 +31,9 @@ This module can be used side to side with other lookup module 
like the global ca
 To use this Apache Druid extension, 
[include](../../configuration/extensions.md#loading-extensions) 
`druid-lookups-cached-single` in the extensions load list.
 
 :::info
- If using JDBC, you will need to add your database's client JAR files to the 
extension's directory.
+To use JDBC, you must add your database client JAR files to the extension's 
directory.
  For Postgres, the connector JAR is already included.
- See the MySQL extension documentation for instructions to obtain 
[MySQL](./mysql.md#installing-the-mysql-connector-library) or 
[MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) 
connector libraries.
+ See the MySQL extension documentation for instructions to obtain 
[MySQL](./mysql.md#install-mysql-connectorj) or 
[MariaDB](./mysql.md#install-mariadb-connectorj) connector libraries.
  Copy or symlink the downloaded file to 
`extensions/druid-lookups-cached-single` under the distribution root directory.
 :::
 
diff --git a/docs/development/extensions-core/mysql.md 
b/docs/development/extensions-core/mysql.md
index bc6012dbb5a..a3678f65056 100644
--- a/docs/development/extensions-core/mysql.md
+++ b/docs/development/extensions-core/mysql.md
@@ -1,6 +1,6 @@
 ---
 id: mysql
-title: "MySQL Metadata Store"
+title: "MySQL metadata store"
 ---
 
 <!--
@@ -25,41 +25,58 @@ title: "MySQL Metadata Store"
 
 To use this Apache Druid extension, 
[include](../../configuration/extensions.md#loading-extensions) 
`mysql-metadata-storage` in the extensions load list.
 
-:::info
- The MySQL extension requires the MySQL Connector/J library or MariaDB 
Connector/J library, neither of which are included in the Druid distribution.
- Refer to the following section for instructions on how to install this 
library.
-:::
+With the MySQL extension, you can use MySQL as a metadata store or ingest from 
a MySQL database.
 
-## Installing the MySQL connector library
+The extension requires a connector library that's not included with Druid.
+See the [Prerequisites](#prerequisites) for installation instructions.
 
-This extension can use Oracle's MySQL JDBC driver which is not included in the 
Druid distribution. You must
-install it separately. There are a few ways to obtain this library:
+## Prerequisites
 
-- It can be downloaded from the MySQL site at: 
https://dev.mysql.com/downloads/connector/j/
-- It can be fetched from Maven Central at: 
https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.2.0/mysql-connector-j-8.2.0.jar
-- It may be available through your package manager, e.g. as `libmysql-java` on 
APT for a Debian-based OS
+To use the MySQL extension, you need to install one of the following libraries:
+* [MySQL Connector/J](#install-mysql-connectorj)
+* [MariaDB Connector/J](#install-mariadb-connectorj)
 
-This fetches the MySQL connector JAR file with a name like 
`mysql-connector-j-8.2.0.jar`.
+### Install MySQL Connector/J
 
-Copy or symlink this file inside the folder 
`extensions/mysql-metadata-storage` under the distribution root directory.
+The MySQL extension uses Oracle's MySQL JDBC driver.
+The current version of Druid uses version 8.2.0.
+Other versions may not work with this extension.
 
-## Alternative: Installing the MariaDB connector library
+You can download the library from one of the following sources:
 
-This extension also supports using the MariaDB connector jar, though it is 
also not included in the Druid distribution, so you must install it separately.
+- [MySQL website](https://dev.mysql.com/downloads/connector/j/)  
+  Visit the archives page to access older product versions.
+- [Maven Central (direct 
download)](https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.2.0/mysql-connector-j-8.2.0.jar)
+- Your package manager. For example, `libmysql-java` on APT for a Debian-based 
OS.
 
-- Download from the MariaDB site: https://mariadb.com/downloads/connector
-- Download from Maven Central: 
https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.3/mariadb-java-client-2.7.3.jar
+The download includes the MySQL connector JAR file with a name like 
`mysql-connector-j-8.2.0.jar`.
+Copy or create a symbolic link to this file inside the `lib` folder in the 
distribution root directory.
 
-This fetches the MariaDB connector JAR file with a name like 
`maria-java-client-2.7.3.jar`.
+### Install MariaDB Connector/J
 
-Copy or symlink this file to `extensions/mysql-metadata-storage` under the 
distribution root directory.
+This extension also supports using the MariaDB connector jar.
+The current version of Druid uses version 2.7.3.
+Other versions may not work with this extension.
+
+You can download the library from one of the following sources:
+
+- [MariaDB 
website](https://mariadb.com/downloads/connectors/connectors-data-access/java8-connector)
  
+  Click **Show All Files** to access older product versions.
+- [Maven Central (direct 
download)](https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.3/mariadb-java-client-2.7.3.jar)
+
+The download includes the MariaDB connector JAR file with a name like 
`maria-java-client-2.7.3.jar`.
+Copy or create a symbolic link to this file inside the `lib` folder in the 
distribution root directory.
 
 To configure the `mysql-metadata-storage` extension to use the MariaDB 
connector library instead of MySQL, set 
`druid.metadata.mysql.driver.driverClassName=org.mariadb.jdbc.Driver`.
 
-Depending on the MariaDB client library version, the connector supports both 
`jdbc:mysql:` and `jdbc:mariadb:` connection URIs. However, the parameters to 
configure the connection vary between implementations, so be sure to [check the 
documentation](https://mariadb.com/kb/en/about-mariadb-connector-j/#connection-strings)
 for details.
+The protocol of the connection string is `jdbc:mysql:` or `jdbc:mariadb:`,
+depending on your specific version of the MariaDB client library.
+For more information on the parameters to configure a connection,
+[see the MariaDB 
documentation](https://mariadb.com/kb/en/about-mariadb-connector-j/#connection-strings)
+for your connector version.
 
 
-## Setting up MySQL
+## Set up MySQL
 
 To avoid issues with upgrades that require schema changes to a large metadata 
table, consider a MySQL version that supports instant ADD COLUMN semantics. For 
example, MySQL 8.
 
@@ -90,7 +107,7 @@ This extension also supports using MariaDB server, 
https://mariadb.org/download/
   CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4;
 
   -- create a druid user
-  CREATE USER 'druid'@'localhost' IDENTIFIED BY 'diurd';
+  CREATE USER 'druid'@'localhost' IDENTIFIED BY 'password';
 
   -- grant the user all the permissions on the database we just created
   GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'localhost';
@@ -111,10 +128,11 @@ This extension also supports using MariaDB server, 
https://mariadb.org/download/
 
 If using the MariaDB connector library, set 
`druid.metadata.mysql.driver.driverClassName=org.mariadb.jdbc.Driver`.
 
-## Encrypting MySQL connections
-  This extension provides support for encrypting MySQL connections. To get 
more information about encrypting MySQL connections using TLS/SSL in general, 
please refer to this 
[guide](https://dev.mysql.com/doc/refman/5.7/en/using-encrypted-connections.html).
+## Encrypt MySQL connections
 
-## Configuration
+This extension provides support for encrypting MySQL connections. To get more 
information about encrypting MySQL connections using TLS/SSL in general, please 
refer to this 
[guide](https://dev.mysql.com/doc/refman/5.7/en/using-encrypted-connections.html).
+
+## Configuration properties
 
 |Property|Description|Default|Required|
 |--------|-----------|-------|--------|
@@ -129,7 +147,10 @@ If using the MariaDB connector library, set 
`druid.metadata.mysql.driver.driverC
 |`druid.metadata.mysql.ssl.enabledSSLCipherSuites`|Overrides the existing 
cipher suites with these cipher suites.|none|no|
 |`druid.metadata.mysql.ssl.enabledTLSProtocols`|Overrides the TLS protocols 
with these protocols.|none|no|
 
-### MySQL InputSource
+## MySQL input source
+
+The MySQL extension provides an implementation of an SQL input source to 
ingest data into Druid from a MySQL database.
+For more information on the input source parameters, see [SQL input 
source](../../ingestion/input-sources.md#sql-input-source).
 
 ```json
 {
diff --git a/docs/development/extensions-core/postgresql.md 
b/docs/development/extensions-core/postgresql.md
index 919bf372b84..006a65ed427 100644
--- a/docs/development/extensions-core/postgresql.md
+++ b/docs/development/extensions-core/postgresql.md
@@ -1,6 +1,6 @@
 ---
 id: postgresql
-title: "PostgreSQL Metadata Store"
+title: "PostgreSQL metadata store"
 ---
 
 <!--
@@ -25,7 +25,9 @@ title: "PostgreSQL Metadata Store"
 
 To use this Apache Druid extension, 
[include](../../configuration/extensions.md#loading-extensions) 
`postgresql-metadata-storage` in the extensions load list.
 
-## Setting up PostgreSQL
+With the  PostgreSQL extension, you can use PostgreSQL as a metadata store or 
ingest from a PostgreSQL database.
+
+## Set up PostgreSQL
 
 To avoid issues with upgrades that require schema changes to a large metadata 
table, consider a PostgreSQL version that supports instant ADD COLUMN semantics.
 
@@ -69,7 +71,7 @@ To avoid issues with upgrades that require schema changes to 
a large metadata ta
   druid.metadata.storage.connector.password=diurd
   ```
 
-## Configuration
+## Configuration properties
 
 In most cases, the configuration options map directly to the [postgres JDBC 
connection 
options](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database).
 
@@ -87,9 +89,10 @@ In most cases, the configuration options map directly to the 
[postgres JDBC conn
 | `druid.metadata.postgres.ssl.sslPasswordCallback` | The classname of the SSL 
password provider. | none | no |
 | `druid.metadata.postgres.dbTableSchema` | druid meta table schema | `public` 
| no |
 
-### PostgreSQL InputSource
+## PostgreSQL input source
 
-The PostgreSQL extension provides an implementation of an [SQL input 
source](../../ingestion/input-sources.md) which can be used to ingest data into 
Druid from a PostgreSQL database.
+The PostgreSQL extension provides an implementation of an SQL input source to 
ingest data into Druid from a PostgreSQL database.
+For more information on the input source parameters, see [SQL input 
source](../../ingestion/input-sources.md#sql-input-source).
 
 ```json
 {
diff --git a/docs/ingestion/input-sources.md b/docs/ingestion/input-sources.md
index 71340abc2c0..495b3fd8733 100644
--- a/docs/ingestion/input-sources.md
+++ b/docs/ingestion/input-sources.md
@@ -29,10 +29,8 @@ For general information on native batch indexing and 
parallel task indexing, see
 
 ## S3 input source
 
-:::info
-
-You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source.
-
+:::info Required extension
+To use the S3 input source, load the extension 
[`druid-s3-extensions`](../development/extensions-core/s3.md) in your 
`common.runtime.properties` file.
 :::
 
 The S3 input source reads objects directly from S3. You can specify either:
@@ -41,7 +39,7 @@ The S3 input source reads objects directly from S3. You can 
specify either:
 * a list of S3 location prefixes that attempts to list the contents and ingest
 all objects contained within the locations.
 
-The S3 input source is splittable. Therefore, you can use it with the 
[Parallel task](./native-batch.md). Each worker task of `index_parallel` reads 
one or multiple objects.
+The S3 input source is splittable. Therefore, you can use it with the 
[parallel task](./native-batch.md). Each worker task of `index_parallel` reads 
one or multiple objects.
 
 Sample specs:
 
@@ -219,16 +217,14 @@ If `accessKeyId` and `secretAccessKey` are not given, the 
default [S3 credential
 
 ## Google Cloud Storage input source
 
-:::info
-
-You need to include the 
[`druid-google-extensions`](../development/extensions-core/google.md) as an 
extension to use the Google Cloud Storage input source.
-
+:::info Required extension
+To use the Google Cloud Storage input source, load the extension 
[`druid-google-extensions`](../development/extensions-core/google.md) in your 
`common.runtime.properties` file.
 :::
 
 The Google Cloud Storage input source is to support reading objects directly
 from Google Cloud Storage. Objects can be specified as list of Google
 Cloud Storage URI strings. The Google Cloud Storage input source is splittable
-and can be used by the [Parallel task](./native-batch.md), where each worker 
task of `index_parallel` will read
+and can be used by the [parallel task](./native-batch.md), where each worker 
task of `index_parallel` will read
 one or multiple objects.
 
 Sample specs:
@@ -307,14 +303,12 @@ Google Cloud Storage object:
 
 ## Azure input source
 
-:::info
-
-You need to include the 
[`druid-azure-extensions`](../development/extensions-core/azure.md) as an 
extension to use the Azure input source.
-
+:::info Required extension
+To use the Azure input source, load the extension 
[`druid-azure-extensions`](../development/extensions-core/azure.md) in your 
`common.runtime.properties` file.
 :::
 
 The Azure input source (that uses the type `azureStorage`) reads objects 
directly from Azure Blob store or Azure Data Lake sources. You can
-specify objects as a list of file URI strings or prefixes. You can split the 
Azure input source for use with [Parallel task](./native-batch.md) indexing and 
each worker task reads one chunk of the split data.
+specify objects as a list of file URI strings or prefixes. You can split the 
Azure input source for use with [parallel task](./native-batch.md) indexing and 
each worker task reads one chunk of the split data.
 
 The `azureStorage` input source is a new schema for Azure input sources that 
allows you to specify which storage account files should be ingested from. We 
recommend that you update any specs that use the old `azure` schema to use the 
new `azureStorage` schema. The new schema provides more functionality than the 
older `azure` schema.
 
@@ -491,15 +485,13 @@ The `objects` property is:
 
 ## HDFS input source
 
-:::info
-
-You need to include the 
[`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension 
to use the HDFS input source.
-
+:::info Required extension
+To use the HDFS input source, load the extension 
[`druid-hdfs-storage`](../development/extensions-core/hdfs.md) in your 
`common.runtime.properties` file.
 :::
 
 The HDFS input source is to support reading files directly
 from HDFS storage. File paths can be specified as an HDFS URI string or a list
-of HDFS URI strings. The HDFS input source is splittable and can be used by 
the [Parallel task](./native-batch.md),
+of HDFS URI strings. The HDFS input source is splittable and can be used by 
the [parallel task](./native-batch.md),
 where each worker task of `index_parallel` will read one or multiple files.
 
 Sample specs:
@@ -593,7 +585,7 @@ The `http` input source is not limited to the HTTP or HTTPS 
protocols. It uses t
 
 For more information about security best practices, see [Security 
overview](../operations/security-overview.md#best-practices).
 
-The HTTP input source is _splittable_ and can be used by the [Parallel 
task](./native-batch.md),
+The HTTP input source is _splittable_ and can be used by the [parallel 
task](./native-batch.md),
 where each worker task of `index_parallel` will read only one file. This input 
source does not support Split Hint Spec.
 
 Sample specs:
@@ -701,7 +693,7 @@ Sample spec:
 
 The Local input source is to support reading files directly from local storage,
 and is mainly intended for proof-of-concept testing.
-The Local input source is _splittable_ and can be used by the [Parallel 
task](./native-batch.md),
+The Local input source is _splittable_ and can be used by the [parallel 
task](./native-batch.md),
 where each worker task of `index_parallel` will read one or multiple files.
 
 Sample spec:
@@ -736,7 +728,7 @@ Sample spec:
 
 The Druid input source is to support reading data directly from existing Druid 
segments,
 potentially using a new schema and changing the name, dimensions, metrics, 
rollup, etc. of the segment.
-The Druid input source is _splittable_ and can be used by the [Parallel 
task](./native-batch.md).
+The Druid input source is _splittable_ and can be used by the [parallel 
task](./native-batch.md).
 This input source has a fixed input format for reading from Druid segments;
 no `inputFormat` field needs to be specified in the ingestion spec when using 
this input source.
 
@@ -833,17 +825,29 @@ For more information on the `maxNumConcurrentSubTasks` 
field, see [Implementatio
 
 ## SQL input source
 
+:::info Required extension
+To use the SQL input source, you must load the appropriate extension in your 
`common.runtime.properties` file.
+* To connect to MySQL, load the extension 
[`mysql-metadata-storage`](../development/extensions-core/mysql.md).
+* To connect to PostgreSQL, load the extension 
[`postgresql-metadata-storage`](../development/extensions-core/postgresql.md).
+
+The MySQL extension requires a JDBC driver.
+For more information, see the [Installing the MySQL connector 
library](../development/extensions-core/mysql.md).
+:::
+
 The SQL input source is used to read data directly from RDBMS.
-The SQL input source is _splittable_ and can be used by the [Parallel 
task](./native-batch.md), where each worker task will read from one SQL query 
from the list of queries.
+You can _split_ the ingestion tasks for a SQL input source. When you use the 
[parallel task](./native-batch.md) type, each worker task reads from one SQL 
query from the list of queries.
 This input source does not support Split Hint Spec.
-Since this input source has a fixed input format for reading events, no 
`inputFormat` field needs to be specified in the ingestion spec when using this 
input source.
-Please refer to the Recommended practices section below before using this 
input source.
+
+The SQL input source has a fixed input format for reading events.
+Don't specify `inputFormat` when using this input source.
+
+Refer to the [recommended practices](#recommended-practices) before using this 
input source.
 
 |Property|Description|Required|
 |--------|-----------|---------|
 |type|Set the value to `sql`.|Yes|
-|database|Specifies the database connection details. The database type 
corresponds to the extension that supplies the `connectorConfig` support. The 
specified extension must be loaded into 
Druid:<br/><br/><ul><li>[mysql-metadata-storage](../development/extensions-core/mysql.md)
 for `mysql`</li><li> 
[postgresql-metadata-storage](../development/extensions-core/postgresql.md) 
extension for `postgresql`.</li></ul><br/><br/>You can selectively allow JDBC 
properties in `connectURI`. See [JDBC  [...]
-|foldCase|Toggle case folding of database column names. This may be enabled in 
cases where the database returns case insensitive column names in query 
results.|No|
+|database|Specifies the database connection details. The database type 
corresponds to the extension that supplies the `connectorConfig` 
support.<br/><br/>You can selectively allow JDBC properties in `connectURI`. 
See [JDBC connections security 
config](../configuration/index.md#jdbc-connections-to-external-databases) for 
more details.|Yes|
+|foldCase|Boolean to toggle case folding of database column names. For 
example, to ingest a database column named `Entry_Date` as `entry_date`, set 
`foldCase` to true and include `entry_date` in the 
[`dimensionsSpec`](ingestion-spec.md#dimensionsspec).|No|
 |sqls|List of SQL queries where each SQL query would retrieve the data to be 
indexed.|Yes|
 
 The following is an example of an SQL input source spec:
@@ -887,7 +891,7 @@ Compared to the other native batch input sources, SQL input 
source behaves diffe
 
 The Combining input source lets you read data from multiple input sources.
 It identifies the splits from delegate input sources and uses a worker task to 
process each split.
-Use the Combining input source only if all the delegates are splittable and 
can be used by the [Parallel task](./native-batch.md).
+Each delegate input source must be splittable and compatible with the 
[parallel task type](./native-batch.md).
 
 Similar to other input sources, the Combining input source supports a single 
`inputFormat`.
 Delegate input sources that require an `inputFormat` must have the same format 
for input data.
@@ -931,10 +935,8 @@ The following is an example of a Combining input source 
spec:
 
 ## Iceberg input source
 
-:::info
-
-To use the Iceberg input source, load the extension 
[`druid-iceberg-extensions`](../development/extensions-contrib/iceberg.md).
-
+:::info Required extension
+To use the Iceberg input source, load the extension 
[`druid-iceberg-extensions`](../development/extensions-contrib/iceberg.md) in 
your `common.runtime.properties` file.
 :::
 
 You use the Iceberg input source to read data stored in the Iceberg table 
format. For a given table, the input source scans up to the latest Iceberg 
snapshot from the configured Hive catalog. Druid ingests the underlying live 
data files using the existing input source formats.
@@ -1138,10 +1140,8 @@ This input source provides the following filters: `and`, 
`equals`, `interval`, a
 
 ## Delta Lake input source
 
-:::info
-
-To use the Delta Lake input source, load the extension 
[`druid-deltalake-extensions`](../development/extensions-contrib/delta-lake.md).
-
+:::info Required extension
+To use the Delta Lake input source, load the extension 
[`druid-deltalake-extensions`](../development/extensions-contrib/delta-lake.md) 
in your `common.runtime.properties` file.
 :::
 
 You can use the Delta input source to read data stored in a Delta Lake table. 
For a given table, the input source scans
diff --git a/docs/querying/lookups-cached-global.md 
b/docs/querying/lookups-cached-global.md
index 72c4189c2da..a0208b17bc3 100644
--- a/docs/querying/lookups-cached-global.md
+++ b/docs/querying/lookups-cached-global.md
@@ -377,7 +377,7 @@ The JDBC lookups will poll a database to populate its local 
cache. If the `tsCol
 :::info
  If using JDBC, you will need to add your database's client JAR files to the 
extension's directory.
  For Postgres, the connector JAR is already included.
- See the MySQL extension documentation for instructions to obtain 
[MySQL](../development/extensions-core/mysql.md#installing-the-mysql-connector-library)
 or 
[MariaDB](../development/extensions-core/mysql.md#alternative-installing-the-mariadb-connector-library)
 connector libraries.
+ See the MySQL extension documentation for instructions to obtain 
[MySQL](../development/extensions-core/mysql.md#install-mysql-connectorj) or 
[MariaDB](../development/extensions-core/mysql.md#install-mariadb-connectorj) 
connector libraries.
  The connector JAR should reside in the classpath of Druid's main class loader.
  To add the connector JAR to the classpath, you can copy the downloaded file 
to `lib/` under the distribution root directory. Alternatively, create a 
symbolic link to the connector in the `lib` directory.
 :::


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: docs: Refresh docs for SQL input source (#17031)

Reply via email to