good to know that.

Thanks,
Ming Wen, Apache APISIX PMC Chair
Twitter: _WenMing


Wo Soyoung <[email protected]> 于2021年1月19日周二 下午3:05写道:

> Hi Ming,
>
> I think that's not a problem for us. The target of chaos mesh are all
> virtual machines, like we normally limit the chaos scope to a certain
> kubernetes node or pod, so it won't affect components outside that scope.
>
> Ming Wen <[email protected]> 于2021年1月19日周二 下午2:53写道:
>
> > I think prometheus is a good idea, we can get the distribution of http
> > response codes, the number of requests, etc.,
> > but there is a question that needs to be considered: What if the
> prometheus
> > service goes crash in the chaos mesh?
> >
> > Thanks,
> > Ming Wen, Apache APISIX PMC Chair
> > Twitter: _WenMing
> >
> >
> > YuanSheng Wang <[email protected]> 于2021年1月19日周二 上午9:42写道:
> >
> > > implement a demo script is useful, agree +1
> > >
> > > >  By 1.20 (Wed):  finish writing the demo script, and present the
> > metrics
> > > > of APISIX with Grafana
> > >
> > > I have some doubts, why do we need to use grafana here?
> > > If it is done in ci, it seems easier to access prometheus directly.
> > >
> > >
> > > On Mon, Jan 18, 2021 at 11:45 PM Shuyang Wu <[email protected]>
> wrote:
> > >
> > > > Hi Community,
> > > >
> > > > It's a bit shame and awkward to resume this feature this late ;( But
> > > gladly
> > > > I have some new thoughts about it:
> > > >
> > > > After some more investigation of how people make use of chaos
> > > engineering,
> > > > to get how things going after certain chaos takes effect, it would be
> > > > better to use Prometheus/Grafana to plot the metrics of APISIX
> > > performance,
> > > > rather than only focusing on nginx logs. Also, since chaos is more
> > about
> > > > mocking problems facing in production, directly using monitoring
> tools
> > > > could let us get what users are facing.
> > > >
> > > > To use Prometheus, we need a demo to run basic functions of APISIX,
> > like
> > > a
> > > > certain amount of traffic, and new rules set by a certain time
> > interval.
> > > It
> > > > seems we do not have that kind of demo, so maybe I plan to write a
> > simple
> > > > script to implement these features.
> > > >
> > > > With monitoring tools and the demo, we could then easily run
> different
> > > > kinds of chaos, and see how things going. When we found something
> > > > interesting and useful, we could then standardize it, write a test
> case
> > > of
> > > > the scenario, and put it into CI. With experiments before, testify
> > > certain
> > > > case is not that hard, so what we should focus more on is to find
> those
> > > > interesting scenarios.
> > > >
> > > > A rough time plan would be:
> > > >     By 1.20 (Wed):  finish writing the demo script, and present the
> > > metrics
> > > > of APISIX with Grafana
> > > >     By 1.22 (Fri):     apply network chaos and see how APISIX works
> > > without
> > > > etcd. Better test with different chaos cases
> > > >     By 1.24 (Sun):   write test case about the network chaos, and
> > running
> > > > on CI
> > > >     Future:              more chaos cases!
> > > >
> > > > The most uncertain part for me is the demo that I'm both unsure about
> > if
> > > we
> > > > have that kind of demo or if we don't, some details about writing the
> > > > script (like what is normal traffic for APISIX).
> > > > Any suggestions are welcome!!
> > > >
> > > > Best,
> > > > Shuyang
> > > >
> > > > Shuyang Wu <[email protected]> 于2020年11月16日周一 下午12:44写道:
> > > >
> > > > > Hi Comunity,
> > > > >
> > > > > Nowadays, we have unit tests, integration tests, and e2e tests, to
> > > ensure
> > > > > the fault tolerance of APISIX. But there are still some problems,
> > like
> > > > > network delay and CPU stress, that have not covered by the above
> > tests.
> > > > > Thus, it would be a better idea to introduce chaos engineering, to
> > > > simulate
> > > > > different types of faults, and test the performance of APISIX in
> > these
> > > > > circumstances.
> > > > >
> > > > > To deploy chaos engineering, ChaosMesh[1] could be a good choice
> for
> > > us.
> > > > > There are several benefits above other chaos engineering tools:
> > > > >
> > > > >    1. ChaosMesh is a CNCF sandbox project and has quite an active
> > > > >    community, which ensures the project would be better and we
> could
> > > get
> > > > help
> > > > >    when needed.
> > > > >    2. ChaosMesh support Github Actions, so when we set up the
> > workflow
> > > of
> > > > >    this integration, it would be easy to do the test in our daily
> > > working
> > > > >    3. ChaosMesh currently supports most types of different chaos
> for
> > > now
> > > > >    and is supporting more. Although we might not need that much for
> > > now,
> > > > it is
> > > > >    a good point when we decide to test more with it.
> > > > >    BTW, chaos types ChaosMesh supports[2] for now(Nov.16, 2020)
> > > includes
> > > > >    pod chaos, network chaos, stress chaos, io chaos, time chaos,
> > kernel
> > > > chaos,
> > > > >    HTTP chaos, and DNS chaos.
> > > > >
> > > > > Following the principles of chaos engineering, there are two main
> > parts
> > > > we
> > > > > need to care about: 1. what should we test and 2. how to prove the
> > > > > correctness after chaos injection.
> > > > >
> > > > > As for what we got for now, the current problems we encounter and
> > need
> > > to
> > > > > simulating are:
> > > > >
> > > > >    1. the connection with etcd is unstable
> > > > >    2. etcd failure
> > > > >    3. problems when cpu/memory/disk stressed out
> > > > >
> > > > > And the method to test correctness including:
> > > > >
> > > > >    1. error log of Nginx and APISIX
> > > > >    2. whether cpu/memory use of APISIX is abnormally high
> > > > >    3. whether wrk benchmarking would fail
> > > > >
> > > > > Welcome provide some other problems or correctness that you might
> > find
> > > > > useful to this~
> > > > >
> > > > >
> > > > > [1] <https://chaos-mesh.org/>https://chaos-mesh.org/
> > > > >
> > > > > [2] <https://chaos-mesh.org/docs/chaos_experiments>
> > > > > https://chaos-mesh.org/docs/chaos_experiments
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Shuyang Wu
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > *MembPhis*
> > > My GitHub: https://github.com/membphis
> > > Apache APISIX: https://github.com/apache/apisix
> > >
> >
>

Reply via email to