luozenglin opened a new pull request, #10533:
URL: https://github.com/apache/doris/pull/10533

   …ity by introducing OpenTelemetry.
   
   The collection of query traces is implemented in fe and be, and the spans 
are exported to zipkin.
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   DSIP: 
<https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry>
   
   ## Problem Summary:
   
   This pr implements the collection and sending of query traces. fe turns on 
traces, collects spans and propagates the traces information to be via prc. 
eventually each node exports the collected spans to zipkin. 
   
   ```
   ┌──────────────────────┐
   │                   fe │
   │                      │
   │        span root     │
   │            │         │
   │    ┌───────┴────┐    ├────────────────┐
   │    │            │    │                │
   │  span A      span B  │                │
   │                │     │                │
   │                │     │                │
   └────────────────┼─────┘                │
                    │                      │
              ┌─────┴───────────────┐      │
              │                     │      │
   ┌──────────┼───────────┐                │
   │          │       be  │        be      │          ┌────────────┐
   │          │           │                │          │            │
   │         span C       │                │   span   │            │
   │          │           ├────────────────┴──────────►   zipkin   │
   │    ┌─────┴───────┐   │                           │            │
   │    │             │   │                           │            │
   │   span D      span E │                           │            │
   │                      │                           └────────────┘
   └──────────────────────┘
   ```
   
   ### Example:
   
   1. Start zipkin:
   
   ```
   curl -sSL https://zipkin.io/quickstart.sh | bash -s
   java -jar zipkin.jar
   ```
   
   2. Add Configuration:
   
   fe.conf:
   ```
   enable_tracing=true
   trace_export_url=http://127.0.0.1:9411/api/v2/spans
   ```
   be.conf:
   ```
   enable_tracing=true
   trace_export_url=http://127.0.0.1:9411/api/v2/spans
   ```
   
   3.  Execute a query
   
   4.  From zipkin UI:
   ![截屏2022-06-30 19 51 
41](https://user-images.githubusercontent.com/37725793/176691405-bab41a65-a050-41fa-a296-454ad3953c38.png)
   
   ### Performance loss
   I have done ssb multi-table join performance test, tracing on the query 
performance is not significant:
   ||before|before|before|tracing|tracing|tracing|
   |:--|:--|:--|:--|:--|:--|:--|
   |q1  |570.2  |557.0  |584.4  |570.4  |597.2  |577.6|
   |q2  |84.2   |83.8   |86.2   |84.0   |84.2   |81.4|
   |q3  |69.6   |72.0   |67.6   |72.4   |70.6   |68.8|
   |q4  |1026.8 |1185.4 |1055.4 |1038.8 |1006.4 |1055.2|
   |q5  |383.2  |432.4  |380.6  |409.0  |398.6  |419.2|
   |q6  |390.6  |441.6  |395.2  |421.0  |405.8  |406.0|
   |q7  |6838.0 |6232.2 |6639.6 |6666.4 |6793.4 |6896.6|
   |q8  |409.2  |367.6  |376.8  |401.6  |430.8  |391.0|
   |q9  |368.2  |348.4  |348.8  |352.4  |354.8  |354.0|
   |q10|97.2    |98.4   |       93.4    |       101.0   |       98.8    |       
98.2|
   |q11|3922.6  |4999.2 |       3911.6  |       4409.0  |       4066.6  |       
4059.0|
   |q12|1325.0  |1708.2 |       1273.4  |       1291.4  |       1282.0  |       
1280.6|
   |q13|584.4   |676.2  |       537.0   |       551.2   |       535.2   |       
551.2|
   
   jmeter stress test:
   
   before:
   ```
   summary +  14000 in 00:00:07 = 2138.1/s Avg:    42 Min:     7 Max:   238 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary +  68872 in 00:00:30 = 2295.8/s Avg:    43 Min:     4 Max:   321 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary =  82872 in 00:00:37 = 2267.5/s Avg:    43 Min:     4 Max:   321 
Err:     0 (0.00%)
   summary +  70385 in 00:00:30 = 2346.2/s Avg:    42 Min:     5 Max:  1048 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary = 153257 in 00:01:07 = 2303.0/s Avg:    43 Min:     4 Max:  1048 
Err:     0 (0.00%)
   summary +  69031 in 00:00:30 = 2301.0/s Avg:    43 Min:     4 Max:  1052 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary = 222288 in 00:01:37 = 2302.4/s Avg:    43 Min:     4 Max:  1052 
Err:     0 (0.00%)
   summary +   8316 in 00:00:04 = 2337.3/s Avg:    42 Min:     9 Max:  1047 
Err:     0 (0.00%) Active: 0 Started: 100 Finished: 100
   summary = 230604 in 00:01:40 = 2303.6/s Avg:    43 Min:     4 Max:  1052 
Err:     0 (0.00%)
   ```
   tracing:
   ```
   summary +      1 in 00:00:00 =    3.5/s Avg:   124 Min:   124 Max:   124 
Err:     0 (0.00%) Active: 23 Started: 23 Finished: 0
   summary +  64305 in 00:00:28 = 2334.5/s Avg:    42 Min:     4 Max:  1056 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary =  64306 in 00:00:28 = 2310.8/s Avg:    42 Min:     4 Max:  1056 
Err:     0 (0.00%)
   summary +  69646 in 00:00:30 = 2321.6/s Avg:    43 Min:     6 Max:  1051 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary = 133952 in 00:00:58 = 2316.4/s Avg:    42 Min:     4 Max:  1056 
Err:     0 (0.00%)
   summary +  69486 in 00:00:30 = 2316.2/s Avg:    43 Min:     4 Max:  1057 
Err:     0 (0.00%) Active: 100 Started: 100 Finished: 0
   summary = 203438 in 00:01:28 = 2316.3/s Avg:    42 Min:     4 Max:  1057 
Err:     0 (0.00%)
   summary +  28284 in 00:00:12 = 2303.8/s Avg:    43 Min:     7 Max:  1052 
Err:     0 (0.00%) Active: 0 Started: 100 Finished: 100
   summary = 231722 in 00:01:40 = 2314.8/s Avg:    42 Min:     4 Max:  1057 
Err:     0 (0.00%)
   ```
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   6. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to