On Thu, Sep 6, 2018 at 4:15 PM, Sinthuja Rajendran <[email protected]> wrote:
> Hi, > > I have a few questions/concerns on as stated below. > > 1) In our WSO2 servers startup script, we do have below java props [1], > which basically can create a heap dump when the server has gone OOM. > Therefore, I believe here you are trying to solve the problem that the > server continues to run, although there is an OOM. IMHO logs are not a > suitable mechanism to find whether the system has gone OOM, because we > can't certainly produce all kind of logs for OOM error. And also in the > proposed method, we can only solve the problem after it has occurred (ie, > incur system outage), and we can't prevent it. IMHO, running the system/JVM > monitoring tool which can monitor and alert after exceeding some percentage > of memory usage is the better solution to solve this problem. > > 2) Thread dumps are mostly related to slow response (sometimes no > response) from the server, and I'm not sure how can we get these details > from the logs. And we need to intelligently handle the logs, just because > of some request timeout that doesn't mean that we need to take the thread > dump, and it can be simply some backend service is down. > > 3) We have carbon-dump.sh which can dump all the thread-dump, heap-dump, > relevant details about the server. Can't we use that for this purpose? > Yes! Looking at the logs to take heap/thread dumps, would not help much. Is there any other dump or data which you are hoping to zip ? Also; How does this specific to IS ? Is there any special diagnosis which you are hoping for IS ? If it is, what are them ? Thanks, Asela. > > [1] -XX:+HeapDumpOnOutOfMemoryError \ > -XX:HeapDumpPath="$RUNTIME_HOME/logs/heap-dump.hprof" \ > > Thanks, > Sinthuja. > > On Thu, Sep 6, 2018 at 3:25 PM Thumilan Mikunthan <[email protected]> > wrote: > >> Hi all, >> >> *Problem* >> >> Whenever an error occurred (depending on that error) certain diagnostics >> actions can help to diagnose the error. >> >> For example, >> >> - >> >> If OOM (Out Of Memory) error occured heap dump will help to analyse >> memory leak. >> - >> >> If some threads blocked abnormally, analyzing thread dump could be >> able to solve the problem. >> >> But in real scenario, doing these diagnostic actions manually may not >> possible because >> >> - >> >> Unable to predict when will the error come. >> - >> >> Depending on error diagnostics actions may vary, expecting that user >> acknowledged about all error scenarios is impossible. >> - >> >> User willing to take support from support team instead of solving the >> error himself/herself. >> >> >> *Solution* >> >> Design a stand alone tool which has less memory footprint (<8%) and less >> CPU usage (<8%) which has following workflow. >> >> - >> >> Log Tailer tails the carbon.log file in real time. >> - >> >> Match Rule Engine checks whether current log line and error regex are >> matching or not. >> - >> >> Tool has to read error regexs from separate xml file. >> - >> >> Interpreter identify the error type and do actions regarding that >> error. >> - >> >> Each action should handle by separate action executer. >> - >> >> Mapping between errors and actions should be written in separate >> xml file. >> - >> >> All the diagnostics files (eg:- thread dumps and heap dumps) for a >> particular error should be created under one folder and zip the folder. >> - >> >> Each folder can identify by time instance >> >> >> *Architecture Diagram* >> [image: ArchitectureDiagram.png] >> >> *Sample Scenario* >> >> Assume that client reporting issue about OOM error. He usually attaches >> carbon.log file along with the issue. But in order to solve the problem >> support team needs thread dump and heap dump. So team requires client to >> take those dumps next time. Client has to wait next time and take those >> dumps. (We can’t expect client to watch the server all the time and get >> dumps when error occurs. What if next error occurs at midnight?). Support >> team has to wait for the update on that issue. So they put the issue on >> pause and goes on. >> >> Now consider above problem scenario with this tool. Once the error >> occurred the tool will take necessary diagnostic actions and zip the >> folder. Client can upload that zip folder with the issue so that the >> support team doesn’t need client to do those diagnostic actions himself. >> The support team able to work on that issue directly without expecting any >> updates from the client. >> >> The next time error occurs (even at midnight) tool can detect the error >> and send necessary files to support time directly for further analysis. >> >> Hence the tool’s memory footprint is small, client can run the tool >> without any objection. >> >> The tool reduces client’s involvement on WSO2 IS errors so that client >> can focus on their business. Tool also helps to reduce the time that need >> to solve the issue because support team could be able to get all necessary >> diagnostic files by once at initial conversation. >> >> Please give feedback regarding this architecture. >> >> Best Regards, >> M.Thumilan >> > > > -- > *Sinthuja Rajendran* > Senior Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > -- Thanks & Regards, Asela Mobile : +94 777 625 933 http://soasecurity.org/ http://xacmlinfo.org/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
