yes: itsm 7.1 P6 is running, I am looking to upgrade Patch levels without any promises.. but I gotta test on a dev system.. .. Which is another issue: of issues...
Thanks guys.. I am glad you all caught one thing I did not check.. and I am not out of my mind.. yet! On Fri, Sep 23, 2011 at 2:07 PM, Ben Chernys < [email protected]> wrote: > ** > > OK. So, it ain’t a denied malloc. You got a bug that somehow you are > exposing and seemingly no one else is. Tough one. BMC could give you a > debug build, you’d generate a core, and give it to some BMC folks. That > would require some very high level of support I would assume (though I have > seen that ages ago).**** > > ** ** > > I still think the best bet is to trace and hunt around filters to see if > anything unusual is the culprit. As Mark said, make sure your system is > taking cores. You have a process ending with SIGSEGV; that always ends up > with a core. I’ve never seen a system where it didn’t but then I am no Unix > admin. **** > > ** ** > > There are other options such as plugging in stuff to log system status at > intervals etc. But your best bet is definitely to turn on logging and > review logs. Try to reproduce this in a safe (non-production) environment. > **** > > ** ** > > BTW, the OS or armonitor doesn’t terminate the process. This is a hardware > interrupt that the OS catches. ARS will have a signal handler and that gets > control. That is what prints the Signal 11 trace.**** > > ** ** > > Are you running ITSM? You didn’t say. But code paths within the arserver > will vary even with the same ARS applications for so many reasons as to make > it astronomical. **** > > ** ** > > Were there any filter / db / environment changes just before the symptoms > started showing up? These are good areas to investigate as well.**** > > ** ** > > These types of problems can be solved (circumvented) quickly or can take > forever (ie not get solved). But they are quite involved and interesting. > **** > > ** ** > > Good luck!**** > > Ben**** > > ** ** > > *From:* Action Request System discussion list(ARSList) [mailto: > [email protected]] *On Behalf Of *patrick zandi > *Sent:* September-23-11 15:38 > > *To:* [email protected] > *Subject:* Re: ARS 7.1 P6 Server -- 4 days restarting (possible memory OS > 32bit issue) signal is 11**** > > ** ** > > ** Perplexing is the (no core bombs) > We are running fine and them Boom, we get that signal: and armonitor shuts > the arserverd down and restarts a new one.. > What I was hoping was someone would Say I remember that.. and Patch X,Y,Z > fixed that.. > > I did conplete logging and the log stops and the log starts (nothing after > the signal and startup).. Clear.. Clean.. > > I looked at all the patches and fine the following:: but cannot put a > finger on any one of them (Specifically) (NL--not likely) (ML- most > likely)**** > > SW00351599 The AR System server crashed when a filter used converted > values in a Set Field action.**** > > SW00351922 The AR System server program terminated when building a > userList to send notifications.**** > > ML-SW00356647 Too many filters executed recursively causing a stack > overflow, which resulted in failure of the AR System server.**** > > SW00328336 The AR System server crashed while saving text to a Diary > field on which audit was enabled.**** > > SW00328337 arserverd crashed while attempting to communicate with the > plug-in server through PluginServerCallWithRetry.**** > > SW00337127 A memory leak issue occurred in the AR System server.**** > > NL-SW00338411 The AR System server crashed during an archiving > process if the archive source form contained non-data fields such as Text, > Trim, Button, and so on.**** > > NL-SW00346370 The AR System server crashed while processing an error > handling filter.**** > > SW00314816 When a user performed a search on a Join form, but did not > have permissions to view all records returned in the result, the AR System > server crashed. **** > > NL--SW00322802 Creating an entry with a user name larger than 180 > bytes caused the AR System server to crash when the status history was being > recorded and the initial status was not New.**** > > > > **** > > On Fri, Sep 23, 2011 at 5:09 AM, Ben Chernys < > [email protected]> wrote:**** > > ** **** > > Hi Mark, Patrick,**** > > **** > > Signal 11 is SIGSEGV which is not necessarily a malloc failure though > indeed a malloc failure may lead to it. It is not always possible to log > malloc failures – after all it takes some memory to cut a log record. *** > * > > **** > > A segmentation violation is always the result of bad code (accessing memory > not allocated to the process or not in the processes address space – which 0 > is a candidate (malloc’s return value on failure)). **** > > **** > > That being said, it is possible to not trigger the execution path with that > bad code by altering filters etc, so definitely the route to go on is along > the lines that Mark talked: the core is always a wealth of info – even > though ARS will not have debugging compiled in ;-) I would also turn on all > logging. SQL, API, Filter on the server, and unlimited, and pointing to the > same file until the next occurrence. Then you will have a wealth of ARS > information to go through. Generally something will stand out.**** > > **** > > Recursive filter loops are usually trapped by the maximum filter limit – > though if that is set high enough the process will run out of memory before > hitting up against that. If yours is high, you could try setting it lower. > **** > > **** > > You may also want to go to a higher patch level if one is available. I am > no longer that familiar with the patches available on 7.1.**** > > **** > > Also, I know that memory on Solaris may be restricted by the admin. (I > forget the commands to determine this – but they will be easily found on the > web). ulimits Perhaps?**** > > **** > > Cheers**** > > **** > > Ben Chernys > > Senior Software Architect > Software Tool House Inc. > > Canada / Deutschland / Germany > Mobile: +49 171 380 2329 GMT + 1 + [ DST ] > Email: Ben.Chernys _AT_ > softwaretoolhouse.com<[email protected]> > Web: www.softwaretoolhouse.com > > Check out Software Tool House's free Diary Editor. > > *Meta-Update,* our premium ARS Data tool, lets you automate > your imports, migrations, *in no time at all*, without programming, > without staging forms, without merge workflow. > http://www.softwaretoolhouse.com/ **** > > **** > > **** > > *From:* Action Request System discussion list(ARSList) [mailto: > [email protected]] *On Behalf Of *Walters, Mark > *Sent:* September-23-11 09:08 > *To:* [email protected] > *Subject:* Re: ARS 7.1 P6 Server -- 4 days restarting (possible memory OS > 32bit issue) signal is 11**** > > **** > > ** **** > > It may be memory but I would expect to see malloc errors (ARERR 300) in the > arerror.log if this was the case. The fact you’re not seeing a stack trace > like this;**** > > **** > > Mon Sep 20 08:33:52 2010 6**** > > Timestamp: Mon Sep 20 2010 08:33:52.1865**** > > Thread Id: 4**** > > Version: 7.1.00 Patch 009 201009200800 **** > > ServerName: test71**** > > Database: SQL -- Oracle**** > > Hardware: sun4u**** > > OS: SunOS 5.10**** > > RPC Id: 337**** > > RPC Call: 106 (GLXS)**** > > RPC Queue: 390600**** > > Client: User Demo from Remedy Administrator (protocol 13) at IP address > 192.168.1.54**** > > Form:**** > > Logging On:**** > > **** > > suggests it may be a recursive filter – on Solaris this often causes a > crash without logging anything useful. Check to see whether there are any > core files in the server/bin directory as this is another symptom of this > type of crash on Solaris. If cores are enabled (check with the OS coreadm > command) then the server may create them even though you’re not running a > debug build.**** > > **** > > If you do have some core files then run the pstack command against them > (pstack core) and you will be able to see the stack of each thread within > the server – if it is a recursive filter causing a stack overflow then one > of the threads should stand out as being much bigger than the others. > Depending on what you see you may then need to enable FILTER/SQL logging to > try and capture the workflow that is causing the crash. It’s also worth > checking the Filter-Max-Stack value in ar.conf – various installers set this > to a very high value – try reducing it back down to 50 or so and this should > stop most filter recursion crashes and log an error instead.**** > > **** > > Mark**** > > **** > > I work for BMC, I don’t speak for them.**** > > **** > > **** > > *From:* Action Request System discussion list(ARSList) [mailto: > [email protected]] *On Behalf Of *patrick zandi > *Sent:* 22 September 2011 21:07 > *To:* [email protected] > *Subject:* ARS 7.1 P6 Server -- 4 days restarting (possible memory OS > 32bit issue) signal is 11**** > > **** > > ** Just a Quick Question:: ARS 7.1 P6 :: on solaris 10, I am seeing a > Operating system telling the ars to shutdown about every 4 -6 days.. > not positive, nothing in debugging of logs at all, only in the > ARMONITOR.log where it says.. **** > > 2011 ARMonitor child process (pid:15277) died with 11. And the signal > is 11.**** > > ./arserverd**** > > > Can I assume Signal 11 is Memory? --- I have seen alot of memory issues > with a 11 signal in the arslist... > > > -- > Patrick Zandi > _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_ **** > > _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_ **** > > _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_**** > > > > > -- > Patrick Zandi > _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_ **** > _attend WWRUG11 www.wwrug.com ARSlist: "Where the Answers Are"_ > -- Patrick Zandi _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are"

