Thanks Ben We're having problems determining where the 11 is coming from.
On Fri, Nov 13, 2009 at 1:49 PM, Ben Chernys < ben.cher...@softwaretoolhouse.com> wrote: > ** > > PS. The 91 is a red herring. It's the Sig 11 (SEGV) you need to worry > about. The 91 is another process not being able to communicate with the > arserverd process. > > Cheers > Ben > > ------------------------------ > *From:* Ben Chernys [mailto:ben.cher...@softwaretoolhouse.com] > *Sent:* November 13, 2009 8:42 PM > *To:* 'arslist@ARSLIST.ORG' > *Subject:* RE: Prod server down - services will not stay up > > The signal 11 is bad code - simple as that. It's a "segmentation > violation" which means that the server (arserverd) attempted to read or > write to an address not allocated to its virtual space. It can also be > caused by a double free or two pointers to one block which has been freed. > In any event, you cannot fix this without the ARS source code which I expect > you would find hard to get. > > That being said, the easiest way to determine (and then circumvent) these > types of things is to turn on SQL logging on the server before the system > starts (through the ar.conf file). The exact settings are in the > configuring ARS guide. > > Then, when the blow up happens, see what the server was attempting to do. > You can usually spot some possible internal database inconsistencies (in ARS > meta-data) in this way and then repair them manually through SQL before the > ARS start-up. > > Additionally, there may be patches available that address the problem. > > Cheers > Ben Chernys > > > > ------------------------------ > *From:* Action Request System discussion list(ARSList) [mailto: > arsl...@arslist.org] *On Behalf Of *Susan Palmer > *Sent:* November 13, 2009 8:30 PM > *To:* arslist@ARSLIST.ORG > *Subject:* Prod server down - services will not stay up > > ** > Help !! > > Working with support but could use anyone else's input. I'm at WWRUG so > it's somewhat limiting. > > We did a truss log and and when the services drop (arerror 91) we see the > following: > 167 > /11: read(54, "\0FE\0\006\0\0\0\0\01017".., 2064) = 254 > /11: write(54, "\0A1\0\006\0\0\0\0\003 ^".., 161) = 161 > /11: read(54, "\0F7\0\006\0\0\0\0\01017".., 2064) = 247 > /11: Incurred fault #6, FLTBOUNDS %pc = 0xFE6A3558 > /11: siginfo: SIGSEGV SEGV_MAPERR addr=0xFB47FB4C > /11: Received signal #11, SIGSEGV [caught] > /11: siginfo: SIGSEGV SEGV_MAPERR addr=0xFB47FB4C > > The services do restart automatically so armonitor is doing it's job. > We've commented out everything from armonitor but the arserverd command. > > We stay up for between 2-10 minutes and then wham, we're down again. > Obviously this just started this morning. > > unix sun solaris 10 > oracle 10g > ars 7.0.1P2 > > They did expand the database size last night if that has any bearing. But > we can connect to the database successfully when ar is down. > > Nothing else helpful in arerror.log, only 91 error. > > I'm at the Hardrock hotel, call room 30601 if you have questions or can > help! > > Thanks, > Susan > > > > > > _Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers > Are"_ > _Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers > Are"_ > _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org Platinum Sponsor:rmisoluti...@verizon.net ARSlist: "Where the Answers Are"