Hello, community!
As we all know, the configuration synchronization in APISIX resorts to ETCD,
once administrator creates/updates/deletes a config instance, it will be
detected by all APISIX instances immediately, that’s cool but the scope is ALL
INSTANCES, which also means all instances might suffer breakdown if the config
instance is malformed (maybe lack of check), that’s not ops-friendly.
We’re familiar with the grayscale for server instances, use a small fraction of
traffic to verify the work of new release, to reduce the influence of faults.
So why not just using this way to verify the new issued config instance? What I
named it as "configuration grayscale".
The way to use "configuration grayscale" is simple, what we need is an
indication to tell the current APISIX instance whether it should apply this
config instance, so obviously we can add a new item in each configuration (like
route, upstream):
{
"upstream": {
"nodes": {
"127.0.0.1:8080": 1
}
},
"annotations": {
"grayscale": {
"hostname": [
"apisix-node1",
"apisix-node3"
]
}
}
}
Here we put the "grayscale" into a more general field "annotations" rather than
flattening it, that's more flexible and clear. The above example tells the
APISIX instance to verify the grayscale firstly, just compare its hostname and
the grayscale targets (wheter it's in the hostname list). If the grayscale
hits, the APISIX instance is willing to use it, or on the contrary, it ignores
this config instance just like it doens't receive it. The hostname comparsion
is just a simple example and that not means we can only use this type of
grayscale conditions. For instance, we may use the Nginx built-in variables
systems to support more flexible grayscale.
{
"upstream": {
"nodes": {
"127.0.0.1:8080": 1
}
},
"annotations": {
"grayscale": {
"vars": [
{ "$pid", "==", "12349" }
]
}
}
}
We need to discuss the most suitable grayscale way for APISIX, which can cover
almost demands that an APISIX administrator needs.
Situtation will be complicated if grayscale is present in the config dependency
(e.g. route depends on upstream), to better describe this problem, let's say we
have two kinds of config A and B, and A depends on B. There are several
situations we need to consider.
1) Both A and B have the grayscale conditions
In such a case, the grayscale conditions must same or there will have some
instances cannot apply both A and B, requests on those instances cannot be
handled properly.
2) A has grayscale conditions but B not
Since A depends on B and B can be applied unconditionally, there is no problem
when A has grayscale conditions.
3) B has grayscale conditions but A not
Which means for APISIX instances that outside of B's apply scope, they cannot
find B, and requests cannot be handled rightly.
So based on these situations, we should add some limitations to avoid these
complicated situations, for example, don't gray release two config instances
when they have relations, testing the "leaf" config instance firstly (B in
abovementioned example) and make sure it's stable then try next.
Let's say a more concrete example, Alice needs to create a new route, for those
requests which uri is prefixed by "/api/v1/trade", proxy them to upstream
"trade-system", head first she adds the upstream and no other Route in APISIX
use this upstream, then she tries to create the route that will use this
upstream, but she is'nt sure whether the upstream, the route are absolute
right, so when she creating the Route on APISIX dashboard, in turn she marks
this Route as grayscale, and only node which name is "apigw-sh-1" can apply
this route, after creating it, she starts to monitor the behaivor in that node
for a while, one day later, all related requests in "apigw-sh-1" meets the
expectations, then she cancels the grayscale and now each APISIX instance
applies these routes.
The support of configuration scale can be gradual, we may support the core
configurations like Route firstly, and let's users to try this feature and get
more feedbacks.
Chao Zhang
[email protected]