This is an automated email from the ASF dual-hosted git repository.

zclll pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 196c19c567f [Feature](func) Support function soundex (#2846)
196c19c567f is described below

commit 196c19c567ffa5f9da7d7417606739966ca79500
Author: linrrarity <[email protected]>
AuthorDate: Tue Sep 9 19:44:00 2025 +0800

    [Feature](func) Support function soundex (#2846)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
    
    ---------
    
    Co-authored-by: linzhenqi <[email protected]>
---
 .../scalar-functions/string-functions/soundex.md   | 102 +++++++++++++++++++++
 .../scalar-functions/string-functions/soundex.md   | 100 ++++++++++++++++++++
 sidebars.json                                      |   1 +
 3 files changed, 203 insertions(+)

diff --git 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
new file mode 100644
index 00000000000..aa5f15e64b3
--- /dev/null
+++ b/docs/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
@@ -0,0 +1,102 @@
+---
+{
+    "title": "SOUNDEX",
+    "language": "en"
+}
+---
+
+## Description
+
+The SOUNDEX function computes the [American 
Soundex](https://en.wikipedia.org/wiki/Soundex) value, which consists of the 
first letter followed by a three-digit sound code that represents the English 
pronunciation of the input string.
+
+The function ignores all non-letter characters in the string.
+
+## Syntax
+
+```sql
+SOUNDEX ( <expr> )
+```
+
+## Arguments
+
+| Argument | Description                |
+|----------|----------------------------|
+| `<expr>` | The string to compute for, only accept ASCII characters. |
+
+## Return Value
+
+Returns a VARCHAR(4) string consisting of an uppercase letter followed by a 
three-digit numeric sound code representing English pronunciation.
+
+If the string is empty or contains no letter characters, an empty string is 
returned.
+
+If the string to be processed contains non-ASCII characters, the function will 
throw an exception during the calculation process.
+
+If the input is NULL, NULL is returned.
+
+## Examples
+
+The following table simulates a list of names.
+```sql
+CREATE TABLE IF NOT EXISTS soundex_test (
+     name VARCHAR(20)
+) DISTRIBUTED BY HASH(name) BUCKETS 1
+PROPERTIES ("replication_num" = "1"); 
+
+INSERT INTO soundex_test (name) VALUES
+('Doris'),
+('Smith'), ('Smyth'),
+('H'), ('P'), ('Lee'), 
+('Robert'), ('R@b-e123rt'),
+('123@*%'), (''),
+('Ashcraft'), ('Honeyman'), ('Pfister'), (NULL);
+```
+
+```sql
+SELECT name, soundex(name) AS IDX FROM soundex_test;
+```
+```text
++------------+------+
+| NULL       | NULL |
+|            |      |
+| 123@*%     |      |
+| Ashcraft   | A261 |
+| Doris      | D620 |
+| H          | H000 |
+| Honeyman   | H555 |
+| Lee        | L000 |
+| P          | P000 |
+| Pfister    | P236 |
+| R@b-e123rt | R163 |
+| Robert     | R163 |
+| Smith      | S530 |
+| Smyth      | S530 |
++------------+------+
+```
+
+Behavior for non-ASCII characters:
+
+- When Doris processes the input string character by character, if it 
encounters a non-ASCII character before finishing the computation, it will 
throw an error. Example:
+
+```sql
+SELECT SOUNDEX('你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+
+```sql
+-- After processing `Doris` it produces D62 (still missing one digit, not a 
complete 4-character code)
+-- When it reads the non-ASCII character `你`, the function errors
+SELECT SOUNDEX('Doris 你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+
+```sql
+SELECT SOUNDEX('Apache Doris 你好');
+```
+
+```text
++--------------------------------+
+| SOUNDEX('Apache Doris 你好')   |
++--------------------------------+
+| A123                           |
++--------------------------------+
+```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
new file mode 100644
index 00000000000..888ca06b7bd
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
@@ -0,0 +1,100 @@
+---
+{
+    "title": "SOUNDEX",
+    "language": "zh-CN"
+}
+---
+
+## 描述
+
+SOUNDEX 函数用于计算[美国 Soundex](https://zh.wikipedia.org/zh-cn/Soundex) 
值,其中包括第一个字母,后跟一个 3 位数字的声音编码
+该编码表示用户指定的字符串的英语发音。
+
+该函数会忽略所有字符串中的非字母字符。
+
+## 语法
+
+```sql
+SOUNDEX ( <expr> )
+```
+
+## 参数
+
+| 参数      | 说明        |
+|---------|-----------|
+| `<expr>` | 需要计算的字符串,仅接受 ASCII 字符。 |
+
+## 返回值
+
+返回一个 VARCHAR(4) 字符串,其中包括一个大写字母,后跟代表英语发音的三位数字声音编码。
+
+如果字符串为空,或字符串中不含任何字母字符,则返回空字符串。
+
+如果待处理的字符串包含非 ASCII 字符,函数将在计算过程中抛出异常。
+
+输入为 NULL 时返回 NULL。
+
+## 举例
+
+下格模拟了一个名字列表。
+```sql
+CREATE TABLE IF NOT EXISTS soundex_test (
+     name VARCHAR(20)
+) DISTRIBUTED BY HASH(name) BUCKETS 1
+PROPERTIES ("replication_num" = "1"); 
+
+INSERT INTO soundex_test (name) VALUES
+('Doris'),
+('Smith'), ('Smyth'),
+('H'), ('P'), ('Lee'), 
+('Robert'), ('R@b-e123rt'),
+('123@*%'), (''),
+('Ashcraft'), ('Honeyman'), ('Pfister'), (NULL);
+```
+
+```sql
+SELECT name, soundex(name) AS IDX FROM soundex_test;
+```
+```text
++------------+------+
+| NULL       | NULL |
+|            |      |
+| 123@*%     |      |
+| Ashcraft   | A261 |
+| Doris      | D620 |
+| H          | H000 |
+| Honeyman   | H555 |
+| Lee        | L000 |
+| P          | P000 |
+| Pfister    | P236 |
+| R@b-e123rt | R163 |
+| Robert     | R163 |
+| Smith      | S530 |
+| Smyth      | S530 |
++------------+------+
+```
+
+对非 ASCII 码的行为:
+
+- Doris 在逐字符处理输入字符串时,如果在完成计算之前遇到非 ASCII 字符,会立即抛出错误,示例如下:
+
+```sql
+SELECT SOUNDEX('你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+```sql
+-- 在处理完 `Doris` 后得到 D62(还缺一位数字,未构成完整的 4 字符编码)
+-- 读到非 ASCII 字符 `你` 后,函数报错
+SELECT SOUNDEX('Doris 你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+```sql
+SELECT SOUNDEX('Apache Doris 你好');
+```
+```text
++--------------------------------+
+| SOUNDEX('Apache Doris 你好')   |
++--------------------------------+
+| A123                           |
++--------------------------------+
+```
\ No newline at end of file
diff --git a/sidebars.json b/sidebars.json
index 2599e2cad2e..dd7841d6d50 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -1292,6 +1292,7 @@
                                         
"sql-manual/sql-functions/scalar-functions/string-functions/rpad",
                                         
"sql-manual/sql-functions/scalar-functions/string-functions/rtrim",
                                         
"sql-manual/sql-functions/scalar-functions/string-functions/rtrim-in",
+                                        
"sql-manual/sql-functions/scalar-functions/string-functions/soundex",
                                         
"sql-manual/sql-functions/scalar-functions/string-functions/strleft",
                                         
"sql-manual/sql-functions/scalar-functions/string-functions/strright",
                                         
"sql-manual/sql-functions/scalar-functions/string-functions/split-by-regexp",


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to