In the arerror log, you will see a line that has nothing in it except the
11.  It is the arserverd daemon. 
 

  _____  

From: Action Request System discussion list(ARSList)
[mailto:arsl...@arslist.org] On Behalf Of Susan Palmer
Sent: November 13, 2009 8:57 PM
To: arslist@ARSLIST.ORG
Subject: Re: Prod server down - services will not stay up


** 
Thanks Ben
 
We're having problems determining where the 11 is coming from.


On Fri, Nov 13, 2009 at 1:49 PM, Ben Chernys
<ben.cher...@softwaretoolhouse.com> wrote:


** 
 
PS.  The 91 is a red herring.  It's the Sig 11 (SEGV) you need to worry
about.   The 91 is another process not being able to communicate with the
arserverd process.
 
Cheers
Ben

  _____  

From: Ben Chernys [mailto:ben.cher...@softwaretoolhouse.com] 
Sent: November 13, 2009 8:42 PM
To: 'arslist@ARSLIST.ORG'
Subject: RE: Prod server down - services will not stay up


The signal 11 is bad code - simple as that.  It's a "segmentation violation"
which means that the server (arserverd) attempted to read or write to an
address not allocated to its virtual space.  It can also be caused by a
double free or two pointers to one block which has been freed.  In any
event, you cannot fix this without the ARS source code which I expect you
would find hard to get.
 
That being said, the easiest way to determine (and then circumvent) these
types of things is to turn on SQL logging on the server before the system
starts (through the ar.conf file).  The exact settings are in the
configuring ARS guide.
 
Then, when the blow up happens, see what the server was attempting to do.
You can usually spot some possible internal database inconsistencies (in ARS
meta-data) in this way and then repair them manually through SQL before the
ARS start-up.
 
Additionally, there may be patches available that address the problem.
 
Cheers
Ben Chernys
 
 

  _____  

From: Action Request System discussion list(ARSList)
[mailto:arsl...@arslist.org] On Behalf Of Susan Palmer
Sent: November 13, 2009 8:30 PM
To: arslist@ARSLIST.ORG
Subject: Prod server down - services will not stay up


** 
Help !!
 
Working with support but could use anyone else's input.  I'm at WWRUG so
it's somewhat limiting.
 
We did a truss log and and when the services drop (arerror 91) we see the
following:
167
/11:    read(54, "\0FE\0\006\0\0\0\0\01017".., 2064)    = 254
/11:    write(54, "\0A1\0\006\0\0\0\0\003 ^".., 161)    = 161
/11:    read(54, "\0F7\0\006\0\0\0\0\01017".., 2064)    = 247
/11:        Incurred fault #6, FLTBOUNDS  %pc = 0xFE6A3558
/11:          siginfo: SIGSEGV SEGV_MAPERR addr=0xFB47FB4C
/11:        Received signal #11, SIGSEGV [caught]
/11:          siginfo: SIGSEGV SEGV_MAPERR addr=0xFB47FB4C
 
The services do restart automatically so armonitor is doing it's job.  We've
commented out everything from armonitor but the arserverd command.
 
We stay up for between 2-10 minutes and then wham, we're down again.
Obviously this just started this morning.
 
unix sun solaris 10
oracle 10g
ars 7.0.1P2
 
They did expand the database size last night if that has any bearing.  But
we can connect to the database successfully when ar is down.
 
Nothing else helpful in arerror.log, only 91 error.
 
I'm at the Hardrock hotel, call room 30601 if you have questions or can
help!
 
Thanks,
Susan
 
 
 

 
_Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers
Are"_ 
_Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers
Are"_ 


_Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers
Are"_ 

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
Platinum Sponsor:rmisoluti...@verizon.net ARSlist: "Where the Answers Are"

Reply via email to