techdocsmith commented on code in PR #16862:
URL: https://github.com/apache/druid/pull/16862#discussion_r1710411870
##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time
lookup table.
## LOWER
-`LOWER(expr)`
+Returns the expression in lowercase.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_string",
+ LOWER("OriginCityName") AS "make_lowercase"
Review Comment:
```suggestion
LOWER("OriginCityName") AS "lowercase"
```
##########
docs/querying/sql-functions.md:
##########
@@ -1309,11 +1373,33 @@ Returns the relative rank of the row calculated as a
percentage according to the
## POSITION
-`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
+Returns the one-based index position of a substring within an expression,
optionally starting from a given one-based index. If `substring` is not found,
returns 0.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax**: `POSITION(substring IN expr [FROM startingIndex])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
-Returns the one-based index position of a substring within an expression,
optionally starting from a given one-based index.
+The following example returns the one-based index of the substring `PR` in the
`OriginCityName` column from the `flight-carriers` datasource starting from
index 5.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_expr",
+ POSITION('PR' IN "OriginCityName" FROM 5) AS "index_found"
Review Comment:
`index_found` makes me think of a boolean. this actually returns the index.
`index` is probably enough. if it is a 1-based index 0 is not in range, so
understandably, not found.
##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time
lookup table.
## LOWER
-`LOWER(expr)`
+Returns the expression in lowercase.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_string",
+ LOWER("OriginCityName") AS "make_lowercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_lowercase` |
+| -- | -- |
+`San Juan, PR` | `san juan, pr` |
+
+</details>
+
+[Learn more](sql-scalar.md#string-functions)
-Returns the expression in lowercase.
## LPAD
-`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
+Returns a string of size `length` from `expr`, left-padded with `chars`. The
default value for `chars` is a space character. Truncates `expr` to `length` if
`length` is shorter than the length of `expr`.
Review Comment:
```suggestion
Returns a string of size `length` from `expr`. When the length of `expr` is
less than `chars`, left pads `expr` with `chars`, which defaults to the space
character. Truncates `expr` to `length` if `length` is shorter than the length
of `expr`.
```
Similarly for rpad
##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time
lookup table.
## LOWER
-`LOWER(expr)`
+Returns the expression in lowercase.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_string",
+ LOWER("OriginCityName") AS "make_lowercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_lowercase` |
Review Comment:
```suggestion
| `original_string` | `lowercase` |
```
##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time
lookup table.
## LOWER
-`LOWER(expr)`
+Returns the expression in lowercase.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_string",
+ LOWER("OriginCityName") AS "make_lowercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_lowercase` |
+| -- | -- |
+`San Juan, PR` | `san juan, pr` |
+
+</details>
+
+[Learn more](sql-scalar.md#string-functions)
-Returns the expression in lowercase.
## LPAD
-`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
+Returns a string of size `length` from `expr`, left-padded with `chars`. The
default value for `chars` is a space character. Truncates `expr` to `length` if
`length` is shorter than the length of `expr`.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LPAD(expr, length[, chars])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example inserts `+` characters as padding to the left of the
`OriginState` column from the `flight-carriers` datasource, making the return
expression a length of 5 characters.
Review Comment:
```suggestion
The following example left pads the value of `OriginState` from
`flight-carriers` to return a total of 5 characters.
```
##########
docs/querying/sql-functions.md:
##########
@@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window.
For example, if two rows t
## REGEXP_EXTRACT
-`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
+Apply regular expression `pattern` to `expr` and extract the `N`-th capture
group. If `N` is unspecified or zero, returns the first substring that matches
the pattern. Returns `null` if there is no matching pattern.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example uses regular expressions to find city names inside the
`OriginCityName` column from the `flight-carriers` datasource by matching what
comes before the comma.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_expr",
+ REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_extracted"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_expr` | `pattern_extracted` |
+| -- | -- |
+| `San Juan, PR` | `San Juan`|
+
+</details>
-Applies a regular expression to the string expression and returns the _n_th
match.
+[Learn more](sql-scalar.md#string-functions)
## REGEXP_LIKE
-`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
+Returns `true` if the regular expression `pattern` finds a match in `expr`.
Returns `false` otherwise.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_LIKE(expr, pattern)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example returns `true` when the `OriginCityName` column from
`flight-carriers` has a city name with two words.
Review Comment:
Would "city name with words separated by a space" be more accurate?
##########
docs/querying/sql-functions.md:
##########
@@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window.
For example, if two rows t
## REGEXP_EXTRACT
-`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
+Apply regular expression `pattern` to `expr` and extract the `N`-th capture
group. If `N` is unspecified or zero, returns the first substring that matches
the pattern. Returns `null` if there is no matching pattern.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example uses regular expressions to find city names inside the
`OriginCityName` column from the `flight-carriers` datasource by matching what
comes before the comma.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_expr",
+ REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_extracted"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_expr` | `pattern_extracted` |
Review Comment:
just `pattern` ?
##########
docs/querying/sql-functions.md:
##########
@@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window.
For example, if two rows t
## REGEXP_EXTRACT
-`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
+Apply regular expression `pattern` to `expr` and extract the `N`-th capture
group. If `N` is unspecified or zero, returns the first substring that matches
the pattern. Returns `null` if there is no matching pattern.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example uses regular expressions to find city names inside the
`OriginCityName` column from the `flight-carriers` datasource by matching what
comes before the comma.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_expr",
+ REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_extracted"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_expr` | `pattern_extracted` |
+| -- | -- |
+| `San Juan, PR` | `San Juan`|
+
+</details>
-Applies a regular expression to the string expression and returns the _n_th
match.
+[Learn more](sql-scalar.md#string-functions)
## REGEXP_LIKE
-`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
+Returns `true` if the regular expression `pattern` finds a match in `expr`.
Returns `false` otherwise.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_LIKE(expr, pattern)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example returns `true` when the `OriginCityName` column from
`flight-carriers` has a city name with two words.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_expr",
+ REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found"
+FROM "flight-carriers"
+LIMIT 2
+```
-Returns true or false signifying whether the regular expression finds a match
in the string expression.
+Returns the following:
+
+| `original_expr` | `pattern_found` |
+| -- | -- |
+| `San Juan, PR` | `true` |
+| `Boston, MA` | `false` |
+
+</details>
+
+[Learn more](sql-scalar.md#string-functions)
## REGEXP_REPLACE
-`REGEXP_REPLACE(<CHARACTER>, <CHARACTER>, <CHARACTER>)`
+Replaces all occurrences of a regular expression in a string expression with a
replacement string. The replacement string may refer to capture groups using
`$1`, `$2`, etc.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_REPLACE(expr, pattern, replacement)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example matches instances of the word `Fort` and replaces it
with it's abbreviation `Ft.`
+
+```sql
+SELECT
+ 'Fort Lauderdale, FL' AS "original_expr",
+ REGEXP_REPLACE('Fort Lauderdale, FL', 'Fort', 'Ft.') AS
"replace_fort_with_abbreviation"
+```
+
+Returns the following:
+
+| `original_expr` | `replace_fort_with_abbreviation` |
Review Comment:
consider different column aliases. Have you been using abbreviations like
"expr" regularly or have you been using the words spelled out?
##########
docs/querying/sql-functions.md:
##########
@@ -1753,11 +1922,32 @@ For more information, see [UNNEST](./sql.md#unnest).
## UPPER
-`UPPER(expr)`
+Returns the expression in uppercase.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `UPPER(expr)`
+* **Function type:** Scalar, string
-Returns the expression in uppercase.
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the
`flight-carriers` datasource to uppercase.
+
+```sql
+SELECT
+ "OriginCityName" AS "original_string",
+ UPPER("OriginCityName") AS "make_uppercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_uppercase` |
Review Comment:
`origin_city`, `uppercase`
##########
docs/querying/sql-functions.md:
##########
@@ -1414,11 +1562,32 @@ Returns the number of the row within the window
starting from 1.
## RPAD
-`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
+Returns a string of size `length` from `expr`, right-padded with `chars`. The
default value for `chars` is a space character. Truncates `expr` to `length` if
`length` is shorter than the length of `expr`.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `RPAD(expr, length[, chars])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example inserts `+` characters as padding to the right of the
`OriginState` column from the `flight-carriers` datasource, making the return
expression a length of `5` characters.
Review Comment:
why does the length of the return string matter?
##########
docs/querying/sql-scalar.md:
##########
@@ -105,11 +105,11 @@ String functions accept strings, and return a type
appropriate to the function.
|`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a
registered [query-time lookup table](lookups.md) named `lookupName`. The
optional constant `replaceMissingValueWith`, if provided, is returned when the
`expr` is null or when the lookup does not contain a value for `expr`.<br /><br
/>Lookups can also be queried directly using the [`lookup`
schema](sql.md#from).|
|`LOWER(expr)`|Returns `expr` in all lowercase.|
|`UPPER(expr)`|Returns `expr` in all uppercase.|
-|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
left-padded with `chars`. If `length` is shorter than the length of `expr`, the
result is `expr` which is truncated to `length`. The result will be null if
either `expr` or `chars` is null. If `chars` is an empty string, no padding is
added, however `expr` may be trimmed if necessary.|
-|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
right-padded with `chars`. If `length` is shorter than the length of `expr`,
the result is `expr` which is truncated to `length`. The result will be null if
either `expr` or `chars` is null. If `chars` is an empty string, no padding is
added, however `expr` may be trimmed if necessary.|
+|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
left-padded with `chars`. If no `chars` is specified, `expr` is padded with
space characters. If `length` is shorter than the length of `expr`, `expr` is
truncated to `length`. If `chars` is an empty string, no padding is added. The
result will be null if either `expr` or `chars` is null.|
+|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
right-padded with `chars`. If no `chars` is specified, `expr` is padded with
space characters. If `length` is shorter than the length of `expr`, `expr` is
truncated to `length`. If `chars` is an empty string, no padding is added. The
result will be null if either `expr` or `chars` is null.|
|`PARSE_LONG(string, [radix])`|Parses a string into a long (BIGINT) with the
given radix, or 10 (decimal) if a radix is not provided.|
|`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of `needle`
within `haystack`, with indexes starting from 1. The search will begin at
`fromIndex`, or 1 if `fromIndex` is not specified. If `needle` is not found,
returns 0.|
-|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern`
to `expr` and extract a capture group, or `NULL` if there is no match. If index
is unspecified or zero, returns the first substring that matched the pattern.
The pattern may match anywhere inside `expr`; if you want to match the entire
string instead, use the `^` and `$` markers at the start and end of your
pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not
possible to differentiate an empty-string match from a non-match (both will
return `NULL`).|
+|`REGEXP_EXTRACT(expr, pattern, [N])`|Apply regular expression `pattern` to
`expr` and extract the `N`-th capture group, or `NULL` if there is no match. If
`N` is unspecified or zero, returns the first substring that matched the
pattern. The pattern may match anywhere inside `expr`; if you want to match the
entire string instead, use the `^` and `$` markers at the start and end of your
pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not
possible to differentiate an empty-string match from a non-match (both will
return `NULL`).|
Review Comment:
```suggestion
|`REGEXP_EXTRACT(expr, pattern, [N])`|Apply regular expression `pattern` to
`expr` and extract the `N`-th capture group, or `NULL` if there is no match. If
`N` is unspecified or zero, returns the first substring that matched the
pattern. The pattern may match anywhere inside `expr`. To match the entire
string, use the `^` and `$` markers at the start and end of your pattern. Note:
when `druid.generic.useDefaultValueForNull = true`, it is not possible to
differentiate an empty-string match from a non-match (both will return `NULL`).|
```
##########
docs/querying/sql-scalar.md:
##########
@@ -105,11 +105,11 @@ String functions accept strings, and return a type
appropriate to the function.
|`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a
registered [query-time lookup table](lookups.md) named `lookupName`. The
optional constant `replaceMissingValueWith`, if provided, is returned when the
`expr` is null or when the lookup does not contain a value for `expr`.<br /><br
/>Lookups can also be queried directly using the [`lookup`
schema](sql.md#from).|
|`LOWER(expr)`|Returns `expr` in all lowercase.|
|`UPPER(expr)`|Returns `expr` in all uppercase.|
-|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
left-padded with `chars`. If `length` is shorter than the length of `expr`, the
result is `expr` which is truncated to `length`. The result will be null if
either `expr` or `chars` is null. If `chars` is an empty string, no padding is
added, however `expr` may be trimmed if necessary.|
-|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
right-padded with `chars`. If `length` is shorter than the length of `expr`,
the result is `expr` which is truncated to `length`. The result will be null if
either `expr` or `chars` is null. If `chars` is an empty string, no padding is
added, however `expr` may be trimmed if necessary.|
+|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`
left-padded with `chars`. If no `chars` is specified, `expr` is padded with
space characters. If `length` is shorter than the length of `expr`, `expr` is
truncated to `length`. If `chars` is an empty string, no padding is added. The
result will be null if either `expr` or `chars` is null.|
Review Comment:
```suggestion
|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`. If
`expr` is shorter than `length`, left pads `expr` with `chars`, which defaults
to space characters. If `expr` exceeds `length`, truncates `expr` to equal
`length`. If `chars` is an empty string, no padding is added. Returns `null`
if either `expr` or `chars` is null.|
```
same for RPAD
##########
docs/querying/sql-functions.md:
##########
@@ -1293,11 +1337,31 @@ Parses `expr` into a `COMPLEX<json>` object. This
operator deserializes JSON val
## PARSE_LONG
-`PARSE_LONG(<CHARACTER>, [<INTEGER>])`
+Converts a string into a long(BIGINT) with the given radix, or into
DECIMAL(base 10) if a radix is not provided.
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:**`PARSE_LONG(string, [radix])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the string representation of the binary, radix
2, number `1100` into its long (BIGINT) equivalent.
+
+```sql
+SELECT
+ '1100' AS "binary_number_as_string",
+ PARSE_LONG('1110', 2) AS "parse_binary_string_into_BIGINT"
+```
+
+Returns the following:
+
+| `binary_number_as_string` | `parse_binary_string_into_BIGINT` |
Review Comment:
you haven't been in the habit of using any caps in the output columns, so
this feels inconsistent. also I'd encourage you to look at a way to be a hint
more concise in the names if possible. would `binary_as_string` and
`bigint_value` work? What about for the other functions with long output
columns?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]