Re: [PR] [Docs] Batch07: adding examples to string functions (druid)

via GitHub Thu, 08 Aug 2024 16:57:47 -0700


techdocsmith commented on code in PR #16862:
URL: https://github.com/apache/druid/pull/16862#discussion_r1710411870



##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time 
lookup table.
 
 ## LOWER
 
-`LOWER(expr)`
+Returns the expression in lowercase.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the 
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_string",
+  LOWER("OriginCityName") AS "make_lowercase"

Review Comment:
   ```suggestion
     LOWER("OriginCityName") AS "lowercase"
   ```



##########
docs/querying/sql-functions.md:
##########
@@ -1309,11 +1373,33 @@ Returns the relative rank of the row calculated as a 
percentage according to the
 
 ## POSITION
 
-`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
+Returns the one-based index position of a substring within an expression, 
optionally starting from a given one-based index. If `substring` is not found, 
returns 0.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax**: `POSITION(substring IN expr [FROM startingIndex])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
 
-Returns the one-based index position of a substring within an expression, 
optionally starting from a given one-based index.
+The following example returns the one-based index of the substring `PR` in the 
`OriginCityName` column from the `flight-carriers` datasource starting from 
index 5.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_expr",
+  POSITION('PR' IN "OriginCityName" FROM 5) AS "index_found"

Review Comment:
   `index_found` makes me think of a boolean. this actually returns the index. 
`index` is probably enough. if it is a 1-based index 0 is not in range, so 
understandably, not found.



##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time 
lookup table.
 
 ## LOWER
 
-`LOWER(expr)`
+Returns the expression in lowercase.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the 
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_string",
+  LOWER("OriginCityName") AS "make_lowercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_lowercase` |
+| -- | -- |
+`San Juan, PR` | `san juan, pr` |
+
+</details>
+
+[Learn more](sql-scalar.md#string-functions)
 
-Returns the expression in lowercase.
 
 ## LPAD
 
-`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
+Returns a string of size `length` from `expr`, left-padded with `chars`. The 
default value for `chars` is a space character. Truncates `expr` to `length` if 
`length` is shorter than the length of `expr`.

Review Comment:
   ```suggestion
   Returns a string of size `length` from `expr`. When the length of `expr` is 
less than `chars`, left pads `expr` with `chars`, which defaults to the space 
character. Truncates `expr` to `length` if `length` is shorter than the length 
of `expr`.
   ```
   Similarly for rpad



##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time 
lookup table.
 
 ## LOWER
 
-`LOWER(expr)`
+Returns the expression in lowercase.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the 
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_string",
+  LOWER("OriginCityName") AS "make_lowercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_lowercase` |

Review Comment:
   ```suggestion
   | `original_string` | `lowercase` |
   ```



##########
docs/querying/sql-functions.md:
##########
@@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time 
lookup table.
 
 ## LOWER
 
-`LOWER(expr)`
+Returns the expression in lowercase.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LOWER(expr)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the 
`flight-carriers` datasource to lowercase.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_string",
+  LOWER("OriginCityName") AS "make_lowercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_lowercase` |
+| -- | -- |
+`San Juan, PR` | `san juan, pr` |
+
+</details>
+
+[Learn more](sql-scalar.md#string-functions)
 
-Returns the expression in lowercase.
 
 ## LPAD
 
-`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
+Returns a string of size `length` from `expr`, left-padded with `chars`. The 
default value for `chars` is a space character. Truncates `expr` to `length` if 
`length` is shorter than the length of `expr`.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `LPAD(expr, length[, chars])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example inserts `+` characters as padding to the left of the 
`OriginState` column from the `flight-carriers` datasource, making the return 
expression a length of 5 characters.

Review Comment:
   ```suggestion
   The following example left pads the value of `OriginState` from 
`flight-carriers` to return a total of 5 characters.
   ```



##########
docs/querying/sql-functions.md:
##########
@@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window. 
For example, if two rows t
 
 ## REGEXP_EXTRACT
 
-`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
+Apply regular expression `pattern` to `expr` and extract the `N`-th capture 
group. If `N` is unspecified or zero, returns the first substring that matches 
the pattern. Returns `null` if there is no matching pattern.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
+* **Function type:** Scalar, string 
+
+<details><summary>Example</summary>
+
+The following example uses regular expressions to find city names inside the 
`OriginCityName` column from the `flight-carriers` datasource by matching what 
comes before the comma.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_expr",
+  REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_extracted"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_expr` | `pattern_extracted` |
+| -- | -- |
+| `San Juan, PR` | `San Juan`|
+
+</details>
 
-Applies a regular expression to the string expression and returns the _n_th 
match.
+[Learn more](sql-scalar.md#string-functions)
 
 ## REGEXP_LIKE
 
-`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
+Returns `true` if the regular expression `pattern` finds a match in `expr`. 
Returns `false` otherwise.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_LIKE(expr, pattern)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example returns `true` when the `OriginCityName` column from 
`flight-carriers` has a city name with two words.

Review Comment:
   Would "city name with words separated by a space" be more accurate?



##########
docs/querying/sql-functions.md:
##########
@@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window. 
For example, if two rows t
 
 ## REGEXP_EXTRACT
 
-`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
+Apply regular expression `pattern` to `expr` and extract the `N`-th capture 
group. If `N` is unspecified or zero, returns the first substring that matches 
the pattern. Returns `null` if there is no matching pattern.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
+* **Function type:** Scalar, string 
+
+<details><summary>Example</summary>
+
+The following example uses regular expressions to find city names inside the 
`OriginCityName` column from the `flight-carriers` datasource by matching what 
comes before the comma.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_expr",
+  REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_extracted"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_expr` | `pattern_extracted` |

Review Comment:
   just `pattern` ?



##########
docs/querying/sql-functions.md:
##########
@@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window. 
For example, if two rows t
 
 ## REGEXP_EXTRACT
 
-`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
+Apply regular expression `pattern` to `expr` and extract the `N`-th capture 
group. If `N` is unspecified or zero, returns the first substring that matches 
the pattern. Returns `null` if there is no matching pattern.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
+* **Function type:** Scalar, string 
+
+<details><summary>Example</summary>
+
+The following example uses regular expressions to find city names inside the 
`OriginCityName` column from the `flight-carriers` datasource by matching what 
comes before the comma.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_expr",
+  REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_extracted"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_expr` | `pattern_extracted` |
+| -- | -- |
+| `San Juan, PR` | `San Juan`|
+
+</details>
 
-Applies a regular expression to the string expression and returns the _n_th 
match.
+[Learn more](sql-scalar.md#string-functions)
 
 ## REGEXP_LIKE
 
-`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
+Returns `true` if the regular expression `pattern` finds a match in `expr`. 
Returns `false` otherwise.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_LIKE(expr, pattern)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example returns `true` when the `OriginCityName` column from 
`flight-carriers` has a city name with two words.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_expr",
+  REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found"
+FROM "flight-carriers"
+LIMIT 2
+```
 
-Returns true or false signifying whether the regular expression finds a match 
in the string expression.
+Returns the following:
+
+| `original_expr` | `pattern_found` |
+| -- | -- |
+| `San Juan, PR` | `true` |
+| `Boston, MA` | `false` |
+
+</details>
+
+[Learn more](sql-scalar.md#string-functions)
 
 ## REGEXP_REPLACE
 
-`REGEXP_REPLACE(<CHARACTER>, <CHARACTER>, <CHARACTER>)`
+Replaces all occurrences of a regular expression in a string expression with a 
replacement string. The replacement string may refer to capture groups using 
`$1`, `$2`, etc.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `REGEXP_REPLACE(expr, pattern, replacement)`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example matches instances of the word `Fort` and replaces it 
with it's abbreviation `Ft.`
+
+```sql
+SELECT 
+  'Fort Lauderdale, FL' AS "original_expr",
+  REGEXP_REPLACE('Fort Lauderdale, FL', 'Fort', 'Ft.') AS 
"replace_fort_with_abbreviation"
+```
+
+Returns the following:
+
+| `original_expr` | `replace_fort_with_abbreviation` |

Review Comment:
   consider different column aliases. Have you been using abbreviations like 
"expr" regularly or have you been using the words spelled out?



##########
docs/querying/sql-functions.md:
##########
@@ -1753,11 +1922,32 @@ For more information, see [UNNEST](./sql.md#unnest).
 
 ## UPPER
 
-`UPPER(expr)`
+Returns the expression in uppercase.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `UPPER(expr)`
+* **Function type:** Scalar, string
 
-Returns the expression in uppercase.
+<details><summary>Example</summary>
+
+The following example converts the `OriginCityName` column from the 
`flight-carriers` datasource to uppercase.
+
+```sql
+SELECT 
+  "OriginCityName" AS "original_string",
+  UPPER("OriginCityName") AS "make_uppercase"
+FROM "flight-carriers"
+LIMIT 1
+```
+
+Returns the following:
+
+| `original_string` | `make_uppercase` |

Review Comment:
   `origin_city`, `uppercase`



##########
docs/querying/sql-functions.md:
##########
@@ -1414,11 +1562,32 @@ Returns the number of the row within the window 
starting from 1.
 
 ## RPAD
 
-`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
+Returns a string of size `length` from `expr`, right-padded with `chars`. The 
default value for `chars` is a space character. Truncates `expr` to `length` if 
`length` is shorter than the length of `expr`.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:** `RPAD(expr, length[, chars])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example inserts `+` characters as padding to the right of the 
`OriginState` column from the `flight-carriers` datasource, making the return 
expression a length of `5` characters.

Review Comment:
   why does the length of the return string matter?



##########
docs/querying/sql-scalar.md:
##########
@@ -105,11 +105,11 @@ String functions accept strings, and return a type 
appropriate to the function.
 |`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a 
registered [query-time lookup table](lookups.md) named `lookupName`. The 
optional constant `replaceMissingValueWith`, if provided, is returned when the 
`expr` is null or when the lookup does not contain a value for `expr`.<br /><br 
/>Lookups can also be queried directly using the [`lookup` 
schema](sql.md#from).|
 |`LOWER(expr)`|Returns `expr` in all lowercase.|
 |`UPPER(expr)`|Returns `expr` in all uppercase.|
-|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
left-padded with `chars`. If `length` is shorter than the length of `expr`, the 
result is `expr` which is truncated to `length`. The result will be null if 
either `expr` or `chars` is null. If `chars` is an empty string, no padding is 
added, however `expr` may be trimmed if necessary.|
-|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
right-padded with `chars`. If `length` is shorter than the length of `expr`, 
the result is `expr` which is truncated to `length`. The result will be null if 
either `expr` or `chars` is null. If `chars` is an empty string, no padding is 
added, however `expr` may be trimmed if necessary.|
+|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
left-padded with `chars`. If no `chars` is specified, `expr` is padded with 
space characters. If `length` is shorter than the length of `expr`, `expr` is 
truncated to `length`.  If `chars` is an empty string, no padding is added. The 
result will be null if either `expr` or `chars` is null.|
+|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
right-padded with `chars`. If no `chars` is specified, `expr` is padded with 
space characters. If `length` is shorter than the length of `expr`, `expr` is 
truncated to `length`.  If `chars` is an empty string, no padding is added. The 
result will be null if either `expr` or `chars` is null.|
 |`PARSE_LONG(string, [radix])`|Parses a string into a long (BIGINT) with the 
given radix, or 10 (decimal) if a radix is not provided.|
 |`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of `needle` 
within `haystack`, with indexes starting from 1. The search will begin at 
`fromIndex`, or 1 if `fromIndex` is not specified. If `needle` is not found, 
returns 0.|
-|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` 
to `expr` and extract a capture group, or `NULL` if there is no match. If index 
is unspecified or zero, returns the first substring that matched the pattern. 
The pattern may match anywhere inside `expr`; if you want to match the entire 
string instead, use the `^` and `$` markers at the start and end of your 
pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not 
possible to differentiate an empty-string match from a non-match (both will 
return `NULL`).|
+|`REGEXP_EXTRACT(expr, pattern, [N])`|Apply regular expression `pattern` to 
`expr` and extract the `N`-th capture group, or `NULL` if there is no match. If 
`N` is unspecified or zero, returns the first substring that matched the 
pattern. The pattern may match anywhere inside `expr`; if you want to match the 
entire string instead, use the `^` and `$` markers at the start and end of your 
pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not 
possible to differentiate an empty-string match from a non-match (both will 
return `NULL`).|

Review Comment:
   ```suggestion
   |`REGEXP_EXTRACT(expr, pattern, [N])`|Apply regular expression `pattern` to 
`expr` and extract the `N`-th capture group, or `NULL` if there is no match. If 
`N` is unspecified or zero, returns the first substring that matched the 
pattern. The pattern may match anywhere inside `expr`. To match the entire 
string, use the `^` and `$` markers at the start and end of your pattern. Note: 
when `druid.generic.useDefaultValueForNull = true`, it is not possible to 
differentiate an empty-string match from a non-match (both will return `NULL`).|
   ```



##########
docs/querying/sql-scalar.md:
##########
@@ -105,11 +105,11 @@ String functions accept strings, and return a type 
appropriate to the function.
 |`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a 
registered [query-time lookup table](lookups.md) named `lookupName`. The 
optional constant `replaceMissingValueWith`, if provided, is returned when the 
`expr` is null or when the lookup does not contain a value for `expr`.<br /><br 
/>Lookups can also be queried directly using the [`lookup` 
schema](sql.md#from).|
 |`LOWER(expr)`|Returns `expr` in all lowercase.|
 |`UPPER(expr)`|Returns `expr` in all uppercase.|
-|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
left-padded with `chars`. If `length` is shorter than the length of `expr`, the 
result is `expr` which is truncated to `length`. The result will be null if 
either `expr` or `chars` is null. If `chars` is an empty string, no padding is 
added, however `expr` may be trimmed if necessary.|
-|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
right-padded with `chars`. If `length` is shorter than the length of `expr`, 
the result is `expr` which is truncated to `length`. The result will be null if 
either `expr` or `chars` is null. If `chars` is an empty string, no padding is 
added, however `expr` may be trimmed if necessary.|
+|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` 
left-padded with `chars`. If no `chars` is specified, `expr` is padded with 
space characters. If `length` is shorter than the length of `expr`, `expr` is 
truncated to `length`.  If `chars` is an empty string, no padding is added. The 
result will be null if either `expr` or `chars` is null.|

Review Comment:
   ```suggestion
   |`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`. If 
`expr` is shorter than `length`, left pads `expr` with `chars`, which defaults 
to space characters. If `expr` exceeds `length`, truncates `expr` to equal 
`length`.  If `chars` is an empty string, no padding is added. Returns `null` 
if either `expr` or `chars` is null.|
   ```
   same for RPAD
   



##########
docs/querying/sql-functions.md:
##########
@@ -1293,11 +1337,31 @@ Parses `expr` into a `COMPLEX<json>` object. This 
operator deserializes JSON val
 
 ## PARSE_LONG
 
-`PARSE_LONG(<CHARACTER>, [<INTEGER>])`
+Converts a string into a long(BIGINT) with the given radix, or into 
DECIMAL(base 10) if a radix is not provided.
 
-**Function type:** [Scalar, string](sql-scalar.md#string-functions)
+* **Syntax:**`PARSE_LONG(string, [radix])`
+* **Function type:** Scalar, string
+
+<details><summary>Example</summary>
+
+The following example converts the string representation of the binary, radix 
2, number `1100` into its long (BIGINT) equivalent.
+
+```sql
+SELECT 
+  '1100' AS "binary_number_as_string", 
+  PARSE_LONG('1110', 2) AS "parse_binary_string_into_BIGINT"
+```
+
+Returns the following:
+
+| `binary_number_as_string` | `parse_binary_string_into_BIGINT` |

Review Comment:
   you haven't been in the habit of using any caps in the output columns, so 
this feels inconsistent. also I'd encourage you to look at a way to be a hint 
more concise in the names if possible. would `binary_as_string` and 
`bigint_value` work? What about for the other functions with long output 
columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [Docs] Batch07: adding examples to string functions (druid)

Reply via email to