Love your nice job @shuyang ^_^ Use a simple case to implement the first version.
Then add more and more cases later. On Mon, Nov 16, 2020 at 1:35 PM Ming Wen <[email protected]> wrote: > Hi, shuyang, > This is indeed what Apache APISIX needs. > I'm glad to konw you want to lead this :) > > Thanks, > Ming Wen, Apache APISIX PMC Chair > Twitter: _WenMing > > > Shuyang Wu <[email protected]> 于2020年11月16日周一 下午12:44写道: > > > Hi Comunity, > > > > Nowadays, we have unit tests, integration tests, and e2e tests, to ensure > > the fault tolerance of APISIX. But there are still some problems, like > > network delay and CPU stress, that have not covered by the above tests. > > Thus, it would be a better idea to introduce chaos engineering, to > simulate > > different types of faults, and test the performance of APISIX in these > > circumstances. > > > > To deploy chaos engineering, ChaosMesh[1] could be a good choice for us. > > There are several benefits above other chaos engineering tools: > > > > 1. ChaosMesh is a CNCF sandbox project and has quite an active > > community, which ensures the project would be better and we could get > > help > > when needed. > > 2. ChaosMesh support Github Actions, so when we set up the workflow of > > this integration, it would be easy to do the test in our daily working > > 3. ChaosMesh currently supports most types of different chaos for now > > and is supporting more. Although we might not need that much for now, > > it is > > a good point when we decide to test more with it. > > BTW, chaos types ChaosMesh supports[2] for now(Nov.16, 2020) includes > > pod chaos, network chaos, stress chaos, io chaos, time chaos, kernel > > chaos, > > HTTP chaos, and DNS chaos. > > > > Following the principles of chaos engineering, there are two main parts > we > > need to care about: 1. what should we test and 2. how to prove the > > correctness after chaos injection. > > > > As for what we got for now, the current problems we encounter and need to > > simulating are: > > > > 1. the connection with etcd is unstable > > 2. etcd failure > > 3. problems when cpu/memory/disk stressed out > > > > And the method to test correctness including: > > > > 1. error log of Nginx and APISIX > > 2. whether cpu/memory use of APISIX is abnormally high > > 3. whether wrk benchmarking would fail > > > > Welcome provide some other problems or correctness that you might find > > useful to this~ > > > > > > [1] <https://chaos-mesh.org/>https://chaos-mesh.org/ > > > > [2] <https://chaos-mesh.org/docs/chaos_experiments> > > https://chaos-mesh.org/docs/chaos_experiments > > > > > > Thanks, > > > > Shuyang Wu > > > -- *MembPhis* My GitHub: https://github.com/membphis Apache APISIX: https://github.com/apache/apisix
