Re: [PR] Fix All Regexp functions Documention [doris-website]

via GitHub Fri, 04 Jul 2025 07:45:04 -0700


zclllyybb commented on code in PR #2568:
URL: https://github.com/apache/doris-website/pull/2568#discussion_r2185483645



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-count.md:
##########
@@ -49,10 +51,11 @@ REGEXP_COUNT(<str>, <pattern>)
 ## 返回值
 
 - 返回正则表达式 “pattern” 在字符串 “str” 中的匹配字符数量，返回类型为 “int”。若没有字符匹配，则返回 0。
-- 1. 如果'str' 或者 'parttern' 为NULL ,或者他们都为NULL，返回NULL;
-- 2. 如果 'pattern' 不符合正则表达式规则，则是错误的用法，抛出error;
 
-### 测试字符串区匹配包含转义字符的表达式返回结果
+- 如果'str' 或者 'parttern' 为NULL ,或者他们都为NULL，返回NULL;
+- 如果 'pattern' 不符合正则表达式规则，则是错误的用法，抛出error;

Review Comment:
   下面例子没有这个case



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-count.md:
##########
@@ -216,4 +227,20 @@ SELECT id, regexp_count(text_data, 'e') as count_e FROM 
test_table_for_regexp_co
 |    9 |       0 |
 |   10 |       1 |
 +------+---------+
+
+```
+
+表情包匹配

Review Comment:
   这个说法不严谨，这些是emoji字符。emoji有加和关系，你应该测试子emoji能否匹配组合emoji。



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md:
##########
@@ -5,21 +5,52 @@
 }
 ---
 
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
 ## 描述
 
-对字符串 str 进行正则匹配，抽取符合 pattern 的所有子模式匹配部分。需要 pattern 完全匹配 str 中的某部分，这样才能返回 
pattern 部分中需匹配部分的字符串数组。如果没有匹配或者 pattern 没有子模式，返回空字符串。
+REGEXP_EXTRACT_ALL函数用于对给定字符串str执行正则表达式匹配，并提取与指定pattern的第一个子模式匹配的所有部分。为了使函数返回表示模式匹配部分的字符串数组，该模式必须与输入字符串str的一部分完全匹配。如果没有匹配项，或模式不包含任何子模式，则返回空字符串。

Review Comment:
   所有中文和英文之间需要有一个空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-extract-all.md:
##########
@@ -5,21 +5,52 @@
 }
 ---
 
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
 ## 描述
 
-对字符串 str 进行正则匹配，抽取符合 pattern 的所有子模式匹配部分。需要 pattern 完全匹配 str 中的某部分，这样才能返回 
pattern 部分中需匹配部分的字符串数组。如果没有匹配或者 pattern 没有子模式，返回空字符串。
+REGEXP_EXTRACT_ALL函数用于对给定字符串str执行正则表达式匹配，并提取与指定pattern的第一个子模式匹配的所有部分。为了使函数返回表示模式匹配部分的字符串数组，该模式必须与输入字符串str的一部分完全匹配。如果没有匹配项，或模式不包含任何子模式，则返回空字符串。

Review Comment:
   另外 “提取与指定pattern的第一个子模式匹配的所有部分” 这个说法是错误的。
   应该是 “所有与指定 pattern 匹配的文本串当中的与第一个子模式匹配的部分”。
   这二者获得的结果是不同的，你原本的说法不要求整个 pattern 产生合法匹配。



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/regexp-count.md:
##########
@@ -25,11 +25,13 @@ under the License.
 ## 描述
 这是一个用于统计字符串中匹配给定正则表达式模式的字符数量的函数。输入包括用户提供的字符串和正则表达式模式。返回值为匹配字符的总数量；如果未找到匹配项，则返回 
0。
 
-1. 'str' 参数为 “string” 类型，是用户希望通过正则表达式进行匹配的字符串。
+需要注意的是，在处理字符集匹配时，应使用 Utf-8 标准字符类。这确保函数能够正确识别和处理来自不同语言的各种字符。

Review Comment:
   你应该给个RE2的链接，我们支持哪些字符类匹配。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix All Regexp functions Documention [doris-website]

Reply via email to