This is an automated email from the ASF dual-hosted git repository.

zclll pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 2d9f2e3e0a3 [Enhancement](regexp) Support zero-width assertions in 
some regexp fu… (#3041)
2d9f2e3e0a3 is described below

commit 2d9f2e3e0a3a137cf8dcf096cd98671e1b696b9e
Author: linrrarity <[email protected]>
AuthorDate: Thu Nov 6 12:45:34 2025 +0800

    [Enhancement](regexp) Support zero-width assertions in some regexp fu… 
(#3041)
    
    …nctions
    
    ## Versions
    
    - [x] dev
    - [ ] 3.x
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../string-functions/regexp-extract-all.md         | 24 ++++++++++++++++++-
 .../string-functions/regexp-extract-or-null.md     | 24 ++++++++++++++++++-
 .../string-functions/regexp-extract.md             | 24 ++++++++++++++++++-
 .../scalar-functions/string-functions/regexp.md    | 27 ++++++++++++++++++++--
 .../string-functions/regexp-extract-all.md         | 25 +++++++++++++++++++-
 .../string-functions/regexp-extract-or-null.md     | 25 +++++++++++++++++++-
 .../string-functions/regexp-extract.md             | 25 +++++++++++++++++++-
 .../scalar-functions/string-functions/regexp.md    | 25 +++++++++++++++++++-
 8 files changed, 190 insertions(+), 9 deletions(-)

diff --git 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
index 3e5255fad0f..b7c13245d1b 100644
--- 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
+++ 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
@@ -13,7 +13,11 @@ It should be noted that when handling character set 
matching, Utf-8 standard cha
 
 If the 'pattern' is not allowed regexp regular, throw error;
 
-Support character match classes : https://github.com/google/re2/wiki/Syntax
+Default supported character match classes : 
https://github.com/google/re2/wiki/Syntax
+
+Doris supports enabling more advanced regular expression features, such as 
look-around zero-width assertions, through the session variable 
`enable_extended_regex` (default is `false`).
+
+Supported character matching types when the session variable 
`enable_extended_regex` is set to `true`: 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
 
 ## Syntax
 
@@ -183,4 +187,22 @@ SELECT regexp_extract_all('hello (world) 123', 
'([[:alpha:]+');
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:alpha:]+
 Error: missing ]: [[:alpha:]+
+```
+
+Advanced regexp
+```sql
+SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=ID:)([A-Z]{2}-\d). 
Error: invalid perl operator: (?<
+```
+
+```sql
+SET enable_extended_regex = true;
+SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
+```
+```text
++-------------------------------------------------------------------------+
+| REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)') |
++-------------------------------------------------------------------------+
+| ['AA-1','BB-2','CC-3']                                                  |
++-------------------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
index 7659ec706ce..13bbd5f4372 100644
--- 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
+++ 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
@@ -16,7 +16,11 @@ Support since Apache Doris 3.0.2
 
 If the 'pattern' is not allowed regexp regular,throw error
 
-Support character match classes : https://github.com/google/re2/wiki/Syntax
+Default supported character match classes : 
https://github.com/google/re2/wiki/Syntax
+
+Doris supports enabling more advanced regular expression features, such as 
look-around zero-width assertions, through the session variable 
`enable_extended_regex` (default is `false`).
+
+Supported character matching types when the session variable 
`enable_extended_regex` is set to `true`: 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
 
 ## Syntax
 
@@ -226,4 +230,22 @@ mysql> SELECT REGEXP_EXTRACT_OR_NULL('123AbCdExCx', 
'([[:lower:]]+)C([[]ower:]]+
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: 
([[:lower:]]+)C([[:lower:]+)
 Error: missing ]: [[:lower:]+)
+```
+
+Advanced regexp
+```sql
+SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). 
Error: invalid perl operator: (?<
+```
+
+```sql
+SET enable_extended_regex = true;
+SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
+```
+```text
++-----------------------------------------------------------------+
+| regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1) |
++-----------------------------------------------------------------+
+| 123                                                             |
++-----------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
index ead16479ce2..f7b65042cf8 100644
--- 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
+++ 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
@@ -17,7 +17,11 @@ The `pos` parameter is of 'integer' type, used to specify 
the position in the st
 
 If the `pattern` is not allowed regexp regular,throw error;
 
-Support character match classes : https://github.com/google/re2/wiki/Syntax
+Default supported character match classes : 
https://github.com/google/re2/wiki/Syntax
+
+Doris supports enabling more advanced regular expression features, such as 
look-around zero-width assertions, through the session variable 
`enable_extended_regex` (default is `false`).
+
+Supported character matching types when the session variable 
`enable_extended_regex` is set to `true`: 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
 
 ## Syntax
 ```sql
@@ -179,4 +183,22 @@ SELECT regexp_extract('AbCdE', '([[:digit:]]+', 1);
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:digit:]]+
 Error: missing ): ([[:digit:]]+
+```
+
+Advanced regexp
+```sql
+SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). 
Error: invalid perl operator: (?<
+```
+
+```sql
+SET enable_extended_regex = true;
+SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
+```
+```text
++---------------------------------------------------------------+
+| regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1) |
++---------------------------------------------------------------+
+| 123                                                           |
++---------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
index 112e3bce869..00fe2219894 100644
--- a/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
+++ b/docs/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
@@ -6,13 +6,20 @@
 ---
 
 ## Description
-
+~
 Performs a regular expression match on the string str, returning true if the 
match succeeds, otherwise false. pattern is the regular expression pattern.
 It should be noted that when handling character set matching, Utf-8 standard 
character classes should be used. This ensures that functions can correctly 
identify and process various characters from different languages.
 
 If the `pattern` is not allowed regexp regular,throw error;
 
-Support character match classes : https://github.com/google/re2/wiki/Syntax
+Default supported character match classes : 
https://github.com/google/re2/wiki/Syntax
+
+Doris supports enabling more advanced regular expression features, such as 
look-around zero-width assertions, through the session variable 
`enable_extended_regex` (default is `false`).
+
+Supported character matching types when the session variable 
`enable_extended_regex` is set to `true`: 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
+
+Note: After enabling this variable, performance will only be affected when the 
regular expression contains advanced syntax (such as look-around). Therefore, 
for better performance, it is recommended to optimize your regular expressions 
as much as possible and avoid using such zero-width assertions.
+
 
 ## Syntax
 
@@ -191,4 +198,20 @@ SELECT REGEXP('Hello, World!', '([a-z');
 
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INTERNAL_ERROR]Invalid regex expression: ([a-z
+```
+
+Advanced regexp
+```sql
+SELECT REGEXP('Apache/Doris', '([a-zA-Z_+-]+(?:\/[a-zA-Z_0-9+-]+)*)(?=s|$)');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INTERNAL_ERROR]Invalid regex expression: 
([a-zA-Z_+-]+(?:/[a-zA-Z_0-9+-]+)*)(?=s|$). Error: invalid perl operator: (?=
+
+SET enable_extended_regex = true;
+SELECT REGEXP('Apache/Doris', '([a-zA-Z_+-]+(?:\/[a-zA-Z_0-9+-]+)*)(?=s|$)');
+```
+```text
++-----------------------------------------------------------------------+
+| REGEXP('Apache/Doris', '([a-zA-Z_+-]+(?:\/[a-zA-Z_0-9+-]+)*)(?=s|$)') |
++-----------------------------------------------------------------------+
+|                                                                     1 |
++-----------------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
index 884fb9269dd..8d214b8b05c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md
@@ -32,7 +32,12 @@ REGEXP_EXTRACT_ALL 函数用于对给定字符串str执行正则表达式匹配
 
 如果 'pattern' 参数不符合正则表达式,则抛出错误
 
-支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+默认支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+
+Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 
look-around 零宽断言。
+
+会话变量`enable_extended_regex`设置为`true`时,
+支持的字符匹配种类 : 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
 
 ## 语法
 
@@ -201,4 +206,22 @@ SELECT regexp_extract_all('hello (world) 123', 
'([[:alpha:]+');
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:alpha:]+
 Error: missing ]: [[:alpha:]+
+```
+
+高级的正则表达式
+```sql
+SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=ID:)([A-Z]{2}-\d). 
Error: invalid perl operator: (?<
+```
+
+```sql
+SET enable_extended_regex = true;
+SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
+```
+```text
++-------------------------------------------------------------------------+
+| REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)') |
++-------------------------------------------------------------------------+
+| ['AA-1','BB-2','CC-3']                                                  |
++-------------------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
index b7494fcd4d6..709f31c6fcf 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-or-null.md
@@ -17,7 +17,12 @@
 从 Apache Doris 3.0.2 版本开始支持
 :::
 
-支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+默认支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+
+Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 
look-around 零宽断言。
+
+会话变量`enable_extended_regex`设置为`true`时,
+支持的字符匹配种类 : 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
 
 ## 语法
 
@@ -227,4 +232,22 @@ mysql> SELECT REGEXP_EXTRACT_OR_NULL('123AbCdExCx', 
'([[:lower:]]+)C([[]ower:]]+
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: 
([[:lower:]]+)C([[:lower:]+)
 Error: missing ]: [[:lower:]+)
+```
+
+高级的正则表达式
+```sql
+SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). 
Error: invalid perl operator: (?<
+```
+
+```sql
+SET enable_extended_regex = true;
+SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
+```
+```text
++-----------------------------------------------------------------+
+| regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1) |
++-----------------------------------------------------------------+
+| 123                                                             |
++-----------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
index 3c6f36ee3f5..2a3194fef9e 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract.md
@@ -37,7 +37,12 @@ pos参数为 'integer' 类型,用于指定字符串中开始搜索正则表达
 
 如果 'pattern' 参数不符合正则表达式,则抛出错误
 
-支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+默认支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+
+Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 
look-around 零宽断言。
+
+会话变量`enable_extended_regex`设置为`true`时,
+支持的字符匹配种类 : 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
 
 ## 语法
 ```sql
@@ -201,4 +206,22 @@ SELECT regexp_extract('AbCdE', '([[:digit:]]+', 1);
 ```text
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:digit:]]+
 Error: missing ): ([[:digit:]]+
+```
+
+高级的正则表达式
+```sql
+SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). 
Error: invalid perl operator: (?<
+```
+
+```sql
+SET enable_extended_regex = true;
+SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
+```
+```text
++---------------------------------------------------------------+
+| regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1) |
++---------------------------------------------------------------+
+| 123                                                           |
++---------------------------------------------------------------+
 ```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
index 2df2c1bac10..ab779142a23 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp.md
@@ -12,7 +12,15 @@
 
 如果 'pattern' 参数不符合正则表达式,则抛出错误
 
-支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+默认支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
+
+Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 
look-around 零宽断言。
+
+会话变量`enable_extended_regex`设置为`true`时
+支持的字符匹配种类 : 
https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
+
+
+注:启用此变量后,仅当正则表达式中包含高级语法(如 
look-around)时才会影响性能。因此,为了获得更好的性能,建议您尽可能优化正则表达式,避免使用此类零宽断言。
 
 ## 语法
 
@@ -192,3 +200,18 @@ SELECT REGEXP('Hello, World!', '([a-z');
 ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.2)[INTERNAL_ERROR]Invalid regex expression: ([a-z
 ```
 
+高级的正则表达式
+```sql
+SELECT regexp('foobar', '(?<=foo)bar');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INTERNAL_ERROR]Invalid regex expression: 
([a-zA-Z_+-]+(?:/[a-zA-Z_0-9+-]+)*)(?=s|$). Error: invalid perl operator: (?<
+
+SET enable_extended_regex = true;
+SELECT regexp('foobar', '(?<=foo)bar');
+```
+```text
++---------------------------------+
+| regexp('foobar', '(?<=foo)bar') |
++---------------------------------+
+|                               1 |
++---------------------------------+
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to