Hello, community!

As we all know, the configuration synchronization in APISIX resorts to ETCD, 
once administrator creates/updates/deletes a config instance, it will be 
detected by all APISIX instances immediately, that’s cool but the scope is ALL 
INSTANCES, which also means all instances might suffer breakdown if the config 
instance is malformed (maybe lack of check), that’s not ops-friendly.

We’re familiar with the grayscale for server instances, use a small fraction of 
traffic to verify the work of new release, to reduce the influence of faults. 
So why not just using this way to verify the new issued config instance? What I 
named it as "configuration grayscale".

The way to use "configuration grayscale" is simple, what we need is an 
indication to tell the current APISIX instance whether it should apply this 
config instance, so obviously we can add a new item in each configuration (like 
route, upstream):


{
    "upstream": {
        "nodes": {
            "127.0.0.1:8080": 1
        } 
    },

    "annotations": {
        "grayscale": {
            "hostname": [
                "apisix-node1",
                "apisix-node3"
            ]
        }
    }
}

Here we put the "grayscale" into a more general field "annotations" rather than 
flattening it, that's more flexible and clear. The above example tells the 
APISIX instance to verify the grayscale firstly, just compare its hostname and 
the grayscale targets (wheter it's in the hostname list). If the grayscale 
hits, the APISIX instance is willing to use it, or on the contrary, it ignores 
this config instance just like it doens't receive it. The hostname comparsion 
is just a simple example and that not means we can only use this type of 
grayscale conditions. For instance, we may use the Nginx built-in variables 
systems to support more flexible grayscale.

{
    "upstream": {
        "nodes": {
            "127.0.0.1:8080": 1
        } 
    },

    "annotations": {
        "grayscale": {
            "vars": [
                { "$pid", "==", "12349" }
            ]
        }
    }
}

We need to discuss the most suitable grayscale way for APISIX, which can cover 
almost demands that an APISIX administrator needs.

Situtation will be complicated if grayscale is present in the config dependency 
(e.g. route depends on upstream), to better describe this problem, let's say we 
have two kinds of config A and B, and A depends on B. There are several 
situations we need to consider.

1) Both A and B have the grayscale conditions

In such a case, the grayscale conditions must same or there will have some 
instances cannot apply both A and B, requests on those instances cannot be 
handled properly.

2) A has grayscale conditions but B not

Since A depends on B and B can be applied unconditionally, there is no problem 
when A has grayscale conditions.

3) B has grayscale conditions but A not

Which means for APISIX instances that outside of B's apply scope, they cannot 
find B, and requests cannot be handled rightly.

So based on these situations, we should add some limitations to avoid these 
complicated situations, for example, don't gray release two config instances 
when they have relations, testing the "leaf" config instance firstly (B in 
abovementioned example) and make sure it's stable then try next.

Let's say a more concrete example, Alice needs to create a new route, for those 
requests which uri is prefixed by "/api/v1/trade", proxy them to upstream 
"trade-system", head first she adds the upstream and no other Route in APISIX 
use this upstream, then she tries to create the route that will use this 
upstream, but she is'nt sure whether the upstream, the route are absolute 
right, so when she creating the Route on APISIX dashboard, in turn she marks 
this Route as grayscale, and only node which name is "apigw-sh-1" can apply 
this route, after creating it, she starts to monitor the behaivor in that node 
for a while, one day later, all related requests in "apigw-sh-1" meets the 
expectations, then she cancels the grayscale and now each APISIX instance 
applies these routes.

The support of configuration scale can be gradual, we may support the core 
configurations like Route firstly, and let's users to try this feature and get 
more feedbacks.


Chao Zhang
[email protected]



Reply via email to