[Architecture] [IS] On Board Diagnostics Tool for IS

Thumilan Mikunthan Thu, 06 Sep 2018 02:56:50 -0700

Hi all,

*Problem*


Whenever an error occurred (depending on that error) certain diagnostics
actions can help to diagnose the error.

For example,

   -

   If OOM (Out Of Memory) error occured heap dump will help to analyse
   memory leak.
   -

   If some threads blocked abnormally, analyzing thread dump could be able
   to solve the problem.

But in real scenario, doing these diagnostic actions manually may not
possible because

   -

   Unable to predict when will the error come.
   -

   Depending on error diagnostics actions may vary, expecting that user
   acknowledged about all error scenarios is impossible.
   -

   User willing to take support from support team instead of solving the
   error himself/herself.


*Solution*

Design a stand alone tool which has less memory footprint (<8%) and less
CPU usage (<8%) which has following workflow.

   -

   Log Tailer tails the carbon.log file in real time.
   -

   Match Rule Engine checks whether current log line and error regex are
   matching or not.
   -

      Tool has to read error regexs from separate xml file.
      -

   Interpreter identify the error type and do actions regarding that error.
   -

      Each action should handle by separate action executer.
      -

      Mapping between errors and actions should be written in separate xml
      file.
      -

   All the diagnostics files (eg:- thread dumps and heap dumps) for a
   particular error should be created under one folder and zip the folder.
   -

      Each folder can identify by time instance


*Architecture Diagram*
[image: ArchitectureDiagram.png]

*Sample Scenario*

Assume that client reporting issue about OOM error. He usually attaches
carbon.log file along with the issue. But in order to solve the problem
support team needs thread dump and heap dump. So team requires client to
take those dumps next time. Client has to wait next time and take those
dumps. (We can’t expect client to watch the server all the time and get
dumps when error occurs. What if next error occurs at midnight?). Support
team has to wait for the update on that issue. So they put the issue on
pause and goes on.

Now consider above problem scenario with this tool. Once the error occurred
the tool will take necessary diagnostic actions and zip the folder. Client
can upload that zip folder with the issue so that the support team doesn’t
need client to do those diagnostic actions himself. The support team able
to work on that issue directly without expecting any updates from the
client.

The next time error occurs (even at midnight) tool can detect the error and
send necessary files to support time directly for further analysis.

Hence the tool’s memory footprint is small, client can run the tool without
any objection.

The tool reduces client’s involvement on WSO2 IS errors so that client can
focus on their business. Tool also helps to reduce the time that need to
solve the issue because support team could be able to get all necessary
diagnostic files by once at initial conversation.

Please give feedback regarding this architecture.

Best Regards,
M.Thumilan

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

[Architecture] [IS] On Board Diagnostics Tool for IS

Reply via email to