[GitHub] [apisix] panxiaojun233 opened a new issue, #7381: feat: Add integration with OpenSergo, a cloud-native service governance specification

GitBox Mon, 04 Jul 2022 05:53:20 -0700


panxiaojun233 opened a new issue, #7381:
URL: https://github.com/apache/apisix/issues/7381


   ### Description
   
   Hi community,
   
   I'd like to propose a discussion about integration with [the OpenSergo 
service governance spec](https://opensergo.io/), which was initiated by 
open-source communities including Apache Dubbo, Kratos, Spring Cloud Alibaba, 
Sentinel, CloudWeGo and more. OpenSergo is a set of general-purpose, 
language-agnostic cloud-native service governance specifications, which are 
based on scenarios and best practices of microservice governance.
   
   We hope to build further connections with the Apache APISIX community, where 
we can discuss and refine the general service governance specification 
together, including traffic routing, rate limiting, [fault 
tolerance](https://github.com/opensergo/opensergo-specification/blob/main/specification/en/fault-tolerance.md)
 and more.
   
   ---
   
   
   [OpenSergo](https://opensergo.io/zh-cn/) 
是一套开放、通用的、面向分布式服务架构、覆盖全链路异构化生态的服务治理标准，基于业界服务治理场景与实践形成服务治理通用标准。OpenSergo 
的最大特点就是**以统一的一套配置/DSL/协议定义服务治理规则，面向多语言异构化架构，做到全链路生态覆盖**。
   
   OpenSergo 社区希望可以联合 Apache APSIX 社区进行进一步的合作，社区来一起讨论与定义一套统一的服务治理标准。Apache 
APSIX 可以适配实现该标准，通过同一套 OpenSergo CRD 标准配置针对流量网关层进行统一的治理管控，可以释放基于 Apache APSIX 
的微服务架构的新价值。
   
   以下是[近期发布的 OpenSergo v1alpha1 
中流量路由、流控降级与容错相关标准](https://opensergo.io/zh-cn/blog/opensergo-v1alpha1-is-coming/)，我们找到了一些和
 Apache APISIX 比较好的结合点，也欢迎社区一起讨论：
   
   ## 流量路由
   
[流量路由](https://github.com/opensergo/opensergo-specification/blob/main/specification/zh-Hans/traffic-routing.md)，顾名思义就是将具有某些属性特征的流量，路由到指定的目标。流量路由是流量治理中重要的一环，开发者可以基于流量路由标准来实现各种场景，如灰度发布、金丝雀发布、容灾路由、标签路由等。
   ### 场景
   流量路由规则(v1alpha1) 主要分为三部分：
   
   - Workload 标签规则 (WorkloadLabelRule)：将某一组 workload 打上对应的标签，这一块可以理解为是为 APISIX 
的各个上游打上对应的标签
   - 流量标签规则 (TrafficLabelRule)：将具有某些属性特征的流量，打上对应的标签
   - 按照 Workload 标签和流量标签来做匹配路由，将带有指定标签的流量路由到匹配的 workload 中
   
![image](https://user-images.githubusercontent.com/43985911/177156906-2f063667-defb-4a5a-8974-22286928db3e.png)
   
   case1: 根据插件中 weighted_upstreams 配置的 weight 值做流量分流。将插件的 upstream 与 route 的 
upstream 按 3:2 的流量比例进行划分，其中 60% 的流量到达插件中gray的upstream， 40% 的流量到达 route 上默认的 
upstream。
   
![image](https://user-images.githubusercontent.com/43985911/177156935-b87eeadc-c334-4641-b2bc-6f4dcf6ea331.png)
   
   
![image](https://user-images.githubusercontent.com/43985911/177156970-e9187b79-e5c6-4eed-ace2-5ff127a44dc9.png)
   
   case2: 通过请求头获取 match 规则参数 (也可以通过请求参数获取 NGINX 变量)，在 match 
规则匹配通过后，表示所有请求都命中到插件配置的 upstream ，否则所有请求只命中 route 上配置的 upstream 。
   
![image](https://user-images.githubusercontent.com/43985911/177156987-30bea7be-d8d7-4ad8-8995-397aa7f28440.png)
   
   
![image](https://user-images.githubusercontent.com/43985911/177157011-81af9ff9-c7fb-40b5-a508-403fdac2f357.png)
   case3: 只配置了一个 vars 规则， vars 中的多个表达式是 and 的关系。在 weighted_upstreams 中根据 weight 
值将流量按 3:2 划分，其中只有 weight 值的部分表示 route 上的 upstream 所占的比例。 当 match 
匹配不通过时，所有的流量只会命中 route 上的 upstream 
   
![image](https://user-images.githubusercontent.com/43985911/177157060-2d093682-19ed-4bc6-a043-27a9bff3457e.png)
   
   
![image](https://user-images.githubusercontent.com/43985911/177157100-5defa311-7f79-44f6-9e98-6ae0f6ef7a7c.png)
   
   
![image](https://user-images.githubusercontent.com/43985911/177157125-5420b091-03ba-40b9-b7a6-ed7c6a1f7401.png)
   
   ### 标准
   
   
![image](https://user-images.githubusercontent.com/43985911/177157148-aefab770-664f-41c1-bd37-3dd4ab7e24be.png)
   
   **给 Workload 打标签：**
   我们对新版本进行灰度时，通常会有单独的环境，单独的部署集。我们将单独的部署集打上 gray 标签（标签值可自定义），标签会参与到具体的流量路由中。
   我们可以通过直接在 Kubernetes workload 上打 label 的方式进行标签绑定，如在 Deployment 上打上 
`traffic.opensergo.io/label: gray`标签代表灰度。对于一些复杂的 workload 
打标场景（如数据库实例、缓存实例标签），我们可以利用 WorkloadLabelRule CRD 进行打标。示例：
   ```yaml
   apiVersion: traffic.opensergo.io/v1alpha1
   kind: WorkloadLabelRule
   metadata:
     name: gray-sts-label-rule
   spec:
     workloadLabels: ['gray']
     selector:
       app: my-app-gray
   ```
   **给流量打标：**
   case1：假设现在需要将 60% 的流量到达插件中 gray WorkLoad中， 40% 的流量到达 route 上默认的 
WorkLoad中。那么只需要配置如下 CRD 即可：
   ```yaml
   apiVersion: traffic.opensergo.io/v1alpha1
   kind: TrafficLabelRule
   metadata:
     name: my-traffic-label-rule
     labels:
       app: my-app
   spec:
     selector:
       app: my-app
     trafficLabel: gray
       weight: 40%
   ```
   
   case2: 假设现在需要将内部测试用户灰度到新版主页，测试用户 uid=12345，UID 位于 `X-User-Id` header 中，转发至 
gray WorkLoad中。那么只需要配置如下 CRD 即可：
   ```yaml
   apiVersion: traffic.opensergo.io/v1alpha1
   kind: TrafficLabelRule
   metadata:
     name: my-traffic-label-rule
     labels:
       app: my-app
   spec:
     selector:
       app: my-app
     trafficLabel: gray
     protocol: http
     match:
     - condition: "=="    # 匹配表达式
       type: header       # 匹配属性类型
       key: 'X-User-Id'   # 参数名
       value: 12345       # 参数值
     - condition: "=="
       value: "/index"
       type: path
   ```
   
   case3: 假设现在需要将内部测试用户灰度到新版主页，测试用户 uid=12345，UID 位于 `X-User-Id` header 
中，将测试用户的流量转发至 gray WorkLoad中。剩余正式用户的流量（即未匹配到规则的流量），需要将 60% 的流量到达插件中gray 
WorkLoad中， 40% 的流量到达 route 上默认的 WorkLoad中。那么只需要配置如下 CRD 即可：
   ```yaml
   apiVersion: traffic.opensergo.io/v1alpha1
   kind: TrafficLabelRule
   metadata:
     name: my-traffic-label-rule
     labels:
       app: my-app
   spec:
     selector:
       app: my-app
     trafficLabel: gray
       weight: 40%
     protocol: http
     match:
     - condition: "=="    # 匹配表达式
       type: header       # 匹配属性类型
       key: 'X-User-Id'   # 参数名
       value: 12345       # 参数值
     - condition: "=="
       value: "/index"
       type: path
   ```
   ### 参考文档：
   
[https://apisix.apache.org/zh/docs/apisix/plugins/traffic-split](https://apisix.apache.org/zh/docs/apisix/plugins/traffic-split)
   ## 流控降级与容错
   
[流控降级与容错](https://github.com/opensergo/opensergo-specification/blob/main/specification/zh-Hans/fault-tolerance.md)
 同样是服务流量治理中关键的一环，以流量为切入点，通过流控、熔断降级、流量平滑、自适应过载保护等手段来保障服务的稳定性。
   ### 场景
   
   - 流量控制--普通，限制请求速度
   - 并发控制
   - 熔断保护
   ### 标准
   每个规则 (FaultToleranceRule) 都可以由以下三部分组成：
   
   - Target: 针对什么样的请求
   - Strategy: 容错或控制策略，如流控、熔断、并发控制、自适应过载保护、离群实例摘除等
   - FallbackAction: 触发后的 fallback 行为，如返回某个错误或状态码
   
   
![image](https://user-images.githubusercontent.com/43985911/177157203-0c858ad6-bf2e-498b-868d-3469df7aacd7.png)
   
   
   
   **limit-req**
   以下示例定义了一个集群流控的策略，集群总体维度每秒不超过 10个请求。示例 CR YAML:
   ```yaml
   apiVersion: fault-tolerance.opensergo.io/v1alpha1
   kind: RateLimitStrategy
   metadata:
     name: rate-limit-foo
   spec:
     metricType: RequestAmount
     limitMode: Global
     threshold: 10
     statDuration: "1s"
   ```
   
   我们给"foo-route"路由增加如上的流控策略。
   ```yaml
   apiVersion: fault-tolerance.opensergo.io/v1alpha1
   kind: FaultToleranceRule
   metadata:
     name: my-rule
     namespace: prod
     labels:
       app: my-apisix # 规则配置生效的应用名
   spec:
     targets:
       - targetResourceName: 'foo-route'
     strategies: 
       - name: rate-limit-foo
     fallbackAction: fallback-foo
   ```
   
   **limit-conn**
   ConcurrencyLimitStrategy 包含以下要素：
   
   | 字段名 | 是否必填 | 类型 | 描述 |
   | --- | --- | --- | --- |
   | maxConcurrency | required | int | 最大并发 |
   | limitMode | required | string (enum) | 控制模式，单节点 `Local`, 集群总体 `Global` |
   
   示例 CR YAML:
   ```yaml
   apiVersion: fault-tolerance.opensergo.io/v1alpha1
   kind: ConcurrencyLimitStrategy
   metadata:
     name: concurrency-limit-foo
   spec:
     maxConcurrency: 8
     limitMode: 'Local'
   ```
   
   
   **api-breaker**
   CircuitBreakerStrategy 对应微服务设计中标准的断路器模式，单机维度生效。CircuitBreakerStrategy 包含以下要素：
   
   - strategy: 熔断策略，目前支持 慢调用比例 `SlowRequestRatio`、错误比例 `ErrorRequestRatio`
   - triggerRatio: 触发比例
   - statDuration: 统计时长，如 `1s`, `5min`；也可考虑 timeUnit 形式
   - recoveryTimeout: 进入熔断状态后的等待时长，等待后会进入半开启恢复模式
   - minRequestAmount: 单位统计时长内，最小请求数
   - slowConditions: 慢调用策略下的条件，若熔断策略为“慢调用比例”则必填 
      - maxAllowedRt: 慢调用策略下，超出该响应时长的请求认为是慢调用
   - errorConditions: 错误策略下的条件，若熔断策略为“错误比例”则必填 
      - 这里要讨论下具体的 representation；主要是什么情况下被计做异常，如根据 HTTP 状态码、gRPC 返回码等
   
   以下示例定义了一个慢调用比例熔断策略（在 30s 内请求超过 500ms 的比例达到 60% 时，且请求数达到5个，则会自动触发熔断，熔断恢复时长为 
5s），示例 CR YAML:
   ```yaml
   apiVersion: fault-tolerance.opensergo.io/v1alpha1
   kind: CircuitBreakerStrategy
   metadata:
     name: circuit-breaker-slow-foo
   spec:
     strategy: SlowRequestRatio
     triggerRatio: '60%'
     statDuration: '30s'
     recoveryTimeout: '5s'
     minRequestAmount: 5
     slowConditions:
       maxAllowedRt: '500ms'
   ```
   ### 参考文档：
   
[https://apisix.apache.org/zh/docs/apisix/plugins/limit-req](https://apisix.apache.org/zh/docs/apisix/plugins/limit-req)
   
[https://apisix.apache.org/zh/docs/apisix/plugins/limit-conn](https://apisix.apache.org/zh/docs/apisix/plugins/limit-conn)
   
[https://apisix.apache.org/zh/docs/apisix/plugins/limit-count](https://apisix.apache.org/zh/docs/apisix/plugins/limit-count)
   
[https://apisix.apache.org/zh/docs/apisix/plugins/api-breaker](https://apisix.apache.org/zh/docs/apisix/plugins/api-breaker)
   
   Discussions are welcomed!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [apisix] panxiaojun233 opened a new issue, #7381: feat: Add integration with OpenSergo, a cloud-native service governance specification

Reply via email to