(doris) branch master updated: [docs] (DebugPoints) Update docs about Debug Points (#28347)

dataroaring Sun, 24 Dec 2023 17:33:59 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new ff365ca1303 [docs] (DebugPoints) Update docs about Debug Points 
(#28347)
ff365ca1303 is described below

commit ff365ca13034f4df84a006c2507bfd6a91c150d4
Author: HowardQin <[email protected]>
AuthorDate: Mon Dec 25 09:33:47 2023 +0800

    [docs] (DebugPoints) Update docs about Debug Points (#28347)
    
    
    ---------
    
    Co-authored-by: qinhao <[email protected]>
---
 .../http-actions/fe/debug-point-action.md          | 243 +++++++++++++++++----
 .../http-actions/fe/debug-point-action.md          | 215 ++++++++++++++----
 2 files changed, 376 insertions(+), 82 deletions(-)

diff --git a/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md 
b/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md
index cac2afdcd25..84ad9bf324a 100644
--- a/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md
+++ b/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md
@@ -26,9 +26,17 @@ under the License.
 
 # Debug Point
 
-Debug point is used in code test. When enabling a debug point, it can run 
related code.
+Debug point is a piece of code, inserted into FE or BE code, when program 
running into this code, 
 
-Both FE and BE support debug points.
+it can change variables or behaviors of the program. 
+
+It is mainly used for unit test or regression test when it is impossible to 
trigger some exceptions through normal means.
+
+Each debug point has a name, the name can be whatever you want, there are 
swithes to enable and disable debug points, 
+
+and you can also pass data to debug points.
+
+Both FE and BE support debug point, and after inserting debug point code, 
recompilation of FE or BE is needed.
 
 ## Code Example
 
@@ -36,8 +44,8 @@ FE example
 
 ```java
 private Status foo() {
-       // dbug_fe_foo_do_nothing is the debug point name.
-       // When it's active，DebugPointUtil.isEnable("dbug_fe_foo_do_nothing") 
will return true.
+       // dbug_fe_foo_do_nothing is the debug point name
+       // when it's active, DebugPointUtil.isEnable("dbug_fe_foo_do_nothing") 
returns true
        if (DebugPointUtil.isEnable("dbug_fe_foo_do_nothing")) {
        return Status.Nothing;
     }
@@ -48,13 +56,13 @@ private Status foo() {
 }
 ```
 
-BE 桩子示例代码
+BE example
 
 ```c++
 void Status foo() {
-     // dbug_be_foo_do_nothing is the debug point name.
-     // When it's active，DEBUG_EXECUTE_IF will execute the code block.
-     DEBUG_EXECUTE_IF("dbug_be_foo_do_nothing",  { return Status.Nothing; });
+     // dbug_be_foo_do_nothing is the debug point name
+     // when it's active, DBUG_EXECUTE_IF will execute the code block
+     DBUG_EXECUTE_IF("dbug_be_foo_do_nothing",  { return Status.Nothing; });
    
      do_foo_action();
      
@@ -62,32 +70,36 @@ void Status foo() {
 }
 ```
 
-## Global config
 
-To activate debug points, need set `enable_debug_points` to true.
+## Global Config
+
+To enable debug points globally, we need to set `enable_debug_points` to true,
+
+`enable_debug_points` is located in FE's fe.conf and BE's be.conf.
 
-`enable_debug_points` was located in FE's fe.conf and BE's be.conf。
 
+## Activate A Specified Debug Point
 
-## Enable Debug Point
+After debug points are enabled globally, a http request with a debug point 
name should be send to FE or BE node, <br/>
+only after that, when the program running into the specified debug point, 
related code can be executed.
 
 ### API
 
 ```
-       POST 
/api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
+POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
 ```
 
 
 ### Query Parameters
 
 * `debug_point_name`
-    Debug point name. Require.
+    Debug point name. Required.
 
 * `timeout`
-    Timeout in seconds. When timeout, the debug point will be disable. Default 
is -1,  not timeout. Optional.
+    Timeout in seconds. When timeout, the debug point will be deactivated. 
Default is -1, never timeout. Optional.
 
 * `execute`
-    Max active times。Default is -1,  unlimit active times. Optional.  
+    After activating, the max times the debug point can be executed. Default 
is -1,  unlimited times. Optional.  
 
 
 ### Request body
@@ -96,24 +108,105 @@ None
 
 ### Response
 
-    ```
-    {
-        msg: "OK",
-        code: 0
-    }
-    ```
+```
+{
+    msg: "OK",
+    code: 0
+}
+```
     
 ### Examples
 
 
-Enable debug point `foo`, activate no more than five times.
+After activating debug point `foo`, executed no more than five times.
        
        
-    ```
-    curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5";
+```
+curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5";
+
+```
+
+
+## Pass Custom Parameters
+When activating debug point, besides "timeout" and "execute" mentioned above, 
passing custom parameters is also allowed.<br/>
+A parameter is a key-value pair in the form of "key=value" in url path, after 
debug point name glued by charactor '?'.<br/> 
+See examples below.
+
+### API
+
+```
+POST /api/debug_point/add/{debug_point_name}[?k1=v1&k2=v2&k3=v3...]
+```
+* `k1=v1` <br/>
+  k1 is parameter name <br/>
+  v1 is parameter value <br/>
+  multiple key-value pairs are concatenated by `&` <br/>
+  
+
+  
+### Request body
+
+None
+
+### Response
+
+```
+{
+    msg: "OK",
+    code: 0
+}
+```
+
+### Examples
+Assuming a FE node with configuration http_port=8030 in fe.conf, <br/>
+the following http request activates a debug point named `foo` in FE node and 
passe parameter `percent` and `duration`:
+>NOTE: User name and password may be needed.
+```
+curl -u root: -X POST 
"http://127.0.0.1:8030/api/debug_point/add/foo?percent=0.5&duration=3";
+```
+
+```
+NOTE:
+1. Inside FE and BE code, names and values of parameters are taken as strings.
+2. Parameter names and values are case sensitive in http request and FE/BE 
code.
+3. FE and BE share same url paths of REST API, it's just their IPs and Ports 
are different.
+```
+
+### Use parameters in FE and BE code
+Following request activates debug point 
`OlapTableSink.write_random_choose_sink` in FE and passes parameter 
`needCatchUp` and `sinkNum`: 
+```
+curl -u root: -X POST 
"http://127.0.0.1:8030/api/debug_point/add/OlapTableSink.write_random_choose_sink?needCatchUp=true&sinkNum=3";
+```
+
+The code in FE checks debug point `OlapTableSink.write_random_choose_sink` and 
gets parameter values:
+```java
+private void debugWriteRandomChooseSink(Tablet tablet, long version, 
Multimap<Long, Long> bePathsMap) {
+    DebugPoint debugPoint = 
DebugPointUtil.getDebugPoint("OlapTableSink.write_random_choose_sink");
+    if (debugPoint == null) {
+        return;
+    }
+    boolean needCatchup = debugPoint.param("needCatchUp", false);
+    int sinkNum = debugPoint.param("sinkNum", 0);
+    ...
+}
+```
+
+Following request activates debug point `TxnManager.prepare_txn.random_failed` 
in BE and passes parameter `percent`:
+```
+curl -X POST 
"http://127.0.0.1:8040/api/debug_point/add/TxnManager.prepare_txn.random_failed?percent=0.7
+```
+
+The code in BE checks debug point `TxnManager.prepare_txn.random_failed` and 
gets parameter value:
+```c++
+DBUG_EXECUTE_IF("TxnManager.prepare_txn.random_failed",
+               {if (rand() % 100 < (100 * dp->param("percent", 0.5))) {
+                       LOG_WARNING("TxnManager.prepare_txn.random_failed 
random failed");
+                       return Status::InternalError("debug prepare txn random 
failed");
+               }}
+);
+```
+
 
-    ```
-    
 ## Disable Debug Point
 
 ### API
@@ -137,10 +230,10 @@ None
 ### Response
 
 ```
-    {
-        msg: "OK",
-        code: 0
-    }
+{
+    msg: "OK",
+    code: 0
+}
 ```
     
 ### Examples
@@ -149,17 +242,17 @@ None
 Disable debug point `foo`。
        
        
-    ```
-    curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo";
+```
+curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo";
 
-    ```
+```
     
 ## Clear Debug Points
 
 ### API
 
 ```
-       POST /api/debug_point/clear
+POST /api/debug_point/clear
 ```
 
 
@@ -170,16 +263,78 @@ None
 
 ### Response
 
-    ```
-    {
-        msg: "OK",
-        code: 0
-    }
-    ```
+```
+{
+    msg: "OK",
+    code: 0
+}
+```
     
 ### Examples
 
        
-    ```
-    curl -X POST "http://127.0.0.1:8030/api/debug_point/clear";
-    ```
\ No newline at end of file
+```
+curl -X POST "http://127.0.0.1:8030/api/debug_point/clear";
+```
+
+## Debug Points in Regression Test
+
+>In community's CI system, `enable_debug_points` configuration of FE and BE 
are true by default.
+
+The Regression test framework also provides methods to activate and deactivate 
a particular debug point, <br/>
+they are declared as below:
+```groovy
+// "name" is the debug point to activate, "params" is a list of key-value 
pairs passed to debug point
+def enableDebugPointForAllFEs(String name, Map<String, String> params = null);
+def enableDebugPointForAllBEs(String name, Map<String, String> params = null);
+// "name" is the debug point to deactivate
+def disableDebugPointForAllFEs(String name);
+def disableDebugPointForAllFEs(String name);
+```
+`enableDebugPointForAllFEs()` or `enableDebugPointForAllBEs()` needs to be 
called before the test actions you want to generate error, <br/>
+and `disableDebugPointForAllFEs()` or `disableDebugPointForAllBEs()` needs to 
be called afterward.
+
+### Concurrent Issue
+
+Enabled debug points affects FE or BE globally, which could cause other 
concurrent tests to fail unexpectly in your pull request. <br/>
+To avoid this, there's a convension that regression tests using debug points 
must be in directory regression-test/suites/fault_injection_p0, <br/>
+and their group name must be "nonConcurrent", as these regression tests will 
be executed serially by pull request workflow. 
+
+### Examples
+
+```groovy
+// .groovy file of the test case must be in 
regression-test/suites/fault_injection_p0
+// and the group name must be 'nonConcurrent'
+suite('debugpoint_action', 'nonConcurrent') {
+    try {
+        // Activate debug point named "PublishVersionDaemon.stop_publish" in 
all FE
+        // and pass parameter "timeout"
+        // "execute" and "timeout" are pre-existing parameters, usage is 
mentioned above
+        
GetDebugPoint().enableDebugPointForAllFEs('PublishVersionDaemon.stop_publish', 
[timeout:1])
+
+        // Activate debug point named 
"Tablet.build_tablet_report_info.version_miss" in all BE
+        // and pass parameter "tablet_id", "version_miss" and "timeout"
+        
GetDebugPoint().enableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss',
+                                                  [tablet_id:'12345', 
version_miss:true, timeout:1])
+
+        // Test actions which will run into debug point and generate error
+        sql """CREATE TABLE tbl_1 (k1 INT, k2 INT)
+               DUPLICATE KEY (k1)
+               DISTRIBUTED BY HASH(k1)
+               BUCKETS 3
+               PROPERTIES ("replication_allocation" = "tag.location.default: 
1");
+            """
+        sql "INSERT INTO tbl_1 VALUES (1, 10)"
+        sql "INSERT INTO tbl_1 VALUES (2, 20)"
+        order_qt_select_1_1 'SELECT * FROM tbl_1'
+
+    } finally {
+        // Deactivate debug points
+        
GetDebugPoint().disableDebugPointForAllFEs('PublishVersionDaemon.stop_publish')
+        
GetDebugPoint().disableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss')
+    }
+}
+```
+
+
+
diff --git a/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md 
b/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md
index a1c9a59a35b..df68ac003c8 100644
--- a/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md
+++ b/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md
@@ -26,9 +26,13 @@ under the License.
 
 # 代码打桩
 
-代码打桩是代码测试使用的。激活木桩后，可以执行木桩代码。木桩的名字是任意取的。
+代码打桩，是指在 FE 或 BE 源码中插入一段代码，当程序执行到这里时，可以改变程序的变量或行为，这样的一段代码称为一个`木桩`。
 
-FE 和 BE 都支持代码打桩。
+主要用于单元测试或回归测试，用来构造正常方法无法实现的异常。
+
+每一个木桩都有一个名称，可以随便取名，可以通过一些机制控制木桩的开关，还可以向木桩传递参数。
+
+FE 和 BE 都支持代码打桩，打桩完后要重新编译 BE 或 FE。
 
 ## 木桩代码示例
 
@@ -54,8 +58,8 @@ BE 桩子示例代码
 void Status foo() {
 
      // dbug_be_foo_do_nothing 是一个木桩名字，
-     // 打开这个木桩之后，DEBUG_EXECUTE_IF 将会执行宏参数中的代码块
-     DEBUG_EXECUTE_IF("dbug_be_foo_do_nothing",  { return Status.Nothing; });
+     // 打开这个木桩之后，DBUG_EXECUTE_IF 将会执行宏参数中的代码块
+     DBUG_EXECUTE_IF("dbug_be_foo_do_nothing",  { return Status.Nothing; });
    
      do_foo_action();
      
@@ -71,11 +75,12 @@ void Status foo() {
 
 
 ## 打开木桩
+打开总开关后，还需要通过向 FE 或 BE 发送 http 请求的方式，打开或关闭指定名称的木桩，只有这样当代码执行到这个木桩时，相关代码才会被执行。
 
 ### API
 
 ```
-       POST 
/api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
+POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
 ```
 
 
@@ -85,10 +90,10 @@ void Status foo() {
     木桩名字。必填。
 
 * `timeout`
-    超时时间，单位为秒。超时之后，木桩失活。默认值-1表示永远不超时。可填。
+    超时时间，单位为秒。超时之后，木桩失活。默认值-1表示永远不超时。可选。
 
 * `execute`
-    木桩最大激活次数。默认值-1表示不限激活次数。可填。       
+    木桩最大执行次数。默认值-1表示不限执行次数。可选。       
 
 
 ### Request body
@@ -97,30 +102,109 @@ void Status foo() {
 
 ### Response
 
-    ```
-    {
-        msg: "OK",
-        code: 0
-    }
-    ```
+```
+{
+    msg: "OK",
+    code: 0
+}
+```
     
 ### Examples
 
 
-打开木桩 `foo`，最多激活5次。
+打开木桩 `foo`，最多执行5次。
        
        
-    ```
-    curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5";
+```
+curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5";
+
+```
 
-    ```
     
+## 向木桩传递参数
+
+激活木桩时，除了前文所述的 timeout 和 execute，还可以传递其它自定义参数。<br/>
+一个参数是一个形如 key=value 的 key-value 对，在 url 的路径部分，紧跟在木桩名称后，以字符 '?' 开头。
+
+### API
+
+```
+POST /api/debug_point/add/{debug_point_name}[?k1=v1&k2=v2&k3=v3...]
+```
+* `k1=v1`
+  k1为参数名称，v1为参数值，多个参数用&分隔。
+  
+### Request body
+
+无
+
+### Response
+
+```
+{
+    msg: "OK",
+    code: 0
+}
+```
+
+### Examples
+
+假设 FE 在 fe.conf 中有配置 http_port=8030，则下面的请求激活 FE 中的木桩`foo`，并传递了两个参数 `percent` 和 
`duration`：
+               
+```
+curl -u root: -X POST 
"http://127.0.0.1:8030/api/debug_point/add/foo?percent=0.5&duration=3";
+```
+
+```
+注意：
+1、在 FE 或 BE 的代码中，参数名和参数值都是字符串。
+2、在 FE 或 BE 的代码中和 http 请求中，参数名称和值都是大小写敏感的。
+3、发给 FE 或 BE 的 http 请求，路径部分格式是相同的，只是 IP 地址和端口号不同。
+```
+
+### 在 FE 和 BE 代码中使用参数
+
+激活 FE 中的木桩`OlapTableSink.write_random_choose_sink`并传递参数 `needCatchUp` 和 
`sinkNum`:
+>注意：可能需要用户名和密码
+```
+curl -u root: -X POST 
"http://127.0.0.1:8030/api/debug_point/add/OlapTableSink.write_random_choose_sink?needCatchUp=true&sinkNum=3";
+```
+
+在 FE 代码中使用木桩 OlapTableSink.write_random_choose_sink 的参数 `needCatchUp` 和 
`sinkNum`：
+```java
+private void debugWriteRandomChooseSink(Tablet tablet, long version, 
Multimap<Long, Long> bePathsMap) {
+    DebugPoint debugPoint = 
DebugPointUtil.getDebugPoint("OlapTableSink.write_random_choose_sink");
+    if (debugPoint == null) {
+        return;
+    }
+    boolean needCatchup = debugPoint.param("needCatchUp", false);
+    int sinkNum = debugPoint.param("sinkNum", 0);
+    ...
+}
+```
+
+
+激活 BE 中的木桩`TxnManager.prepare_txn.random_failed`并传递参数 `percent`:
+```
+curl -X POST 
"http://127.0.0.1:8040/api/debug_point/add/TxnManager.prepare_txn.random_failed?percent=0.7
+```
+在 BE 代码中使用木桩 `TxnManager.prepare_txn.random_failed` 的参数 `percent`：
+```c++
+DBUG_EXECUTE_IF("TxnManager.prepare_txn.random_failed",
+               {if (rand() % 100 < (100 * dp->param("percent", 0.5))) {
+                       LOG_WARNING("TxnManager.prepare_txn.random_failed 
random failed");
+                       return Status::InternalError("debug prepare txn random 
failed");
+               }}
+);
+```
+
+
 ## 关闭木桩
 
 ### API
 
 ```
-       POST /api/debug_point/remove/{debug_point_name}
+POST /api/debug_point/remove/{debug_point_name}
 ```
 
 
@@ -137,10 +221,10 @@ void Status foo() {
 ### Response
 
 ```
-    {
-        msg: "OK",
-        code: 0
-    }
+{
+    msg: "OK",
+    code: 0
+}
 ```
     
 ### Examples
@@ -149,39 +233,94 @@ void Status foo() {
 关闭木桩`foo`。
        
        
-    ```
-    curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo";
-
-    ```
+```
+curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo";
+```
     
 ## 清除所有木桩
 
 ### API
 
 ```
-       POST /api/debug_point/clear
+POST /api/debug_point/clear
 ```
 
-
-
 ### Request body
 
 无
 
 ### Response
 
-    ```
-    {
-        msg: "OK",
-        code: 0
-    }
-    ```
+```
+{
+    msg: "OK",
+    code: 0
+}
+```
     
 ### Examples
 
 
 清除所有木桩。
        
-    ```
-    curl -X POST "http://127.0.0.1:8030/api/debug_point/clear";
-    ```
+```
+curl -X POST "http://127.0.0.1:8030/api/debug_point/clear";
+```
+
+## 在回归测试中使用木桩
+
+> 提交PR时，社区 CI 系统默认开启 FE 和 BE 的`enable_debug_points`配置。
+
+回归测试框架提供方法函数来开关指定的木桩，它们声明如下：
+
+```groovy
+// 打开木桩，name 是木桩名称，params 是一个key-value列表，是传给木桩的参数
+def enableDebugPointForAllFEs(String name, Map<String, String> params = null);
+def enableDebugPointForAllBEs(String name, Map<String, String> params = null);
+// 关闭木桩，name 是木桩的名称
+def disableDebugPointForAllFEs(String name);
+def disableDebugPointForAllFEs(String name);
+```
+需要在调用测试 action 之前调用 `enableDebugPointForAllFEs()` 或 
`enableDebugPointForAllBEs()` 来开启木桩， <br/>
+这样执行到木桩代码时，相关代码才会被执行，<br/>
+然后在调用测试 action 之后调用 `disableDebugPointForAllFEs()` 或 
`disableDebugPointForAllBEs()` 来关闭木桩。
+
+### 并发问题
+
+FE 或 BE 中开启的木桩是全局生效的，同一个 Pull Request 中，并发跑的其它测试，可能会受影响而意外失败。
+为了避免这种情况，我们规定，使用木桩的回归测试，必须放在 regression-test/suites/fault_injection_p0 目录下，
+且组名必须设置为 `nonConcurrent`，社区 CI 系统对于这些用例，会串行运行。
+
+### Examples
+
+```groovy
+// 测试用例的.groovy 文件必须放在 regression-test/suites/fault_injection_p0 目录下，
+// 且组名设置为 'nonConcurrent'
+suite('debugpoint_action', 'nonConcurrent') {
+    try {
+        // 打开所有FE中，名为 "PublishVersionDaemon.stop_publish" 的木桩
+        // 传参数 timeout
+        // 与上面curl调用时一样，execute 是执行次数，timeout 是超时秒数
+        
GetDebugPoint().enableDebugPointForAllFEs('PublishVersionDaemon.stop_publish', 
[timeout:1])
+        // 打开所有BE中，名为 "Tablet.build_tablet_report_info.version_miss" 的木桩
+        // 传参数 tablet_id, version_miss 和 timeout
+        
GetDebugPoint().enableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss',
+                                                  [tablet_id:'12345', 
version_miss:true, timeout:1])
+
+        // 测试用例，会触发木桩代码的执行
+        sql """CREATE TABLE tbl_1 (k1 INT, k2 INT)
+               DUPLICATE KEY (k1)
+               DISTRIBUTED BY HASH(k1)
+               BUCKETS 3
+               PROPERTIES ("replication_allocation" = "tag.location.default: 
1");
+            """
+        sql "INSERT INTO tbl_1 VALUES (1, 10)"
+        sql "INSERT INTO tbl_1 VALUES (2, 20)"
+        order_qt_select_1_1 'SELECT * FROM tbl_1'
+
+    } finally {
+        
GetDebugPoint().disableDebugPointForAllFEs('PublishVersionDaemon.stop_publish')
+        
GetDebugPoint().disableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss')
+    }
+}
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [docs] (DebugPoints) Update docs about Debug Points (#28347)

Reply via email to