More specifically, check maxsiz.  Is the arserverd core dumping?  What
size is the core?
 

Darrell Reading Systems Engineer 
Phone 479.204.5739 
dere...@wal-mart.com 

Wal-Mart Stores, Inc. 
805 Moberly Lane, MS-0560-68 
Bentonville, AR 72716 
Save Money. Live Better 

 

________________________________

From: Action Request System discussion list(ARSList)
[mailto:arsl...@arslist.org] On Behalf Of Susan Palmer
Sent: Friday, November 13, 2009 14:13
To: arslist@ARSLIST.ORG
Subject: Re: Prod server down - services will not stay up


** 
We've turned on both sql and api logging now to capture the next event.
 
How would the db space affect this?  They actually just expanded it last
night.
 
Working with the unix guys in the office and support, just not the same
when you're not there.
 


 
On Fri, Nov 13, 2009 at 2:08 PM, Ben Chernys
<ben.cher...@softwaretoolhouse.com> wrote:


        ** 
        These are a bit of a pain to solve.  SQL logging on startup is
the key.  The logs are quite big but usually the last lines will be
pertinent.  You also need to know the database structure of the
meta-data - given by the database reference guide.
         
        I'm afraid that these types of problems are not likely to get
solved whilst in a hotel room as once you have the idea of where the
problem lies (through the log) you then need to research the meta-data
itself.  The sql log will simply let you know the meta-data table that
was last read and not which record of that table caused the server to
crash.
         
        7.0.1 p2 seems a little low.  It is possible to patch the
binaries?  
         
        It is unlikely that simply allocating more database space is the
problem.  You could also look at the temporary space and see that it was
increased but I would go with the logs first.
         
        2 - 10 minutes will most surely be in the servers initial
processing of the meta-data. 
        Cheers
        Ben
        
________________________________

        
        From: Action Request System discussion list(ARSList)
[mailto:arsl...@arslist.org] On Behalf Of Susan Palmer
        
        Sent: November 13, 2009 8:57 PM
        To: arslist@ARSLIST.ORG
        Subject: Re: Prod server down - services will not stay up
        
        
        ** 
        Thanks Ben
         
        We're having problems determining where the 11 is coming from.
        
        
        On Fri, Nov 13, 2009 at 1:49 PM, Ben Chernys
<ben.cher...@softwaretoolhouse.com> wrote:
        

                ** 
                 
                PS.  The 91 is a red herring.  It's the Sig 11 (SEGV)
you need to worry about.   The 91 is another process not being able to
communicate with the arserverd process.
                 
                Cheers
                Ben

________________________________

                From: Ben Chernys
[mailto:ben.cher...@softwaretoolhouse.com] 
                Sent: November 13, 2009 8:42 PM
                To: 'arslist@ARSLIST.ORG'
                Subject: RE: Prod server down - services will not stay
up
                
                
                The signal 11 is bad code - simple as that.  It's a
"segmentation violation" which means that the server (arserverd)
attempted to read or write to an address not allocated to its virtual
space.  It can also be caused by a double free or two pointers to one
block which has been freed.  In any event, you cannot fix this without
the ARS source code which I expect you would find hard to get.
                 
                That being said, the easiest way to determine (and then
circumvent) these types of things is to turn on SQL logging on the
server before the system starts (through the ar.conf file).  The exact
settings are in the configuring ARS guide.
                 
                Then, when the blow up happens, see what the server was
attempting to do.  You can usually spot some possible internal database
inconsistencies (in ARS meta-data) in this way and then repair them
manually through SQL before the ARS start-up.
                 
                Additionally, there may be patches available that
address the problem.
                 
                Cheers
                Ben Chernys
                 
                 

________________________________

                From: Action Request System discussion list(ARSList)
[mailto:arsl...@arslist.org] On Behalf Of Susan Palmer
                Sent: November 13, 2009 8:30 PM
                To: arslist@ARSLIST.ORG
                Subject: Prod server down - services will not stay up
                
                
                ** 
                Help !!
                 
                Working with support but could use anyone else's input.
I'm at WWRUG so it's somewhat limiting.
                 
                We did a truss log and and when the services drop
(arerror 91) we see the following:
                167
                /11:    read(54, "\0FE\0\006\0\0\0\0\01017".., 2064)
= 254
                /11:    write(54, "\0A1\0\006\0\0\0\0\003 ^".., 161)
= 161
                /11:    read(54, "\0F7\0\006\0\0\0\0\01017".., 2064)
= 247
                /11:        Incurred fault #6, FLTBOUNDS  %pc =
0xFE6A3558
                /11:          siginfo: SIGSEGV SEGV_MAPERR
addr=0xFB47FB4C
                /11:        Received signal #11, SIGSEGV [caught]
                /11:          siginfo: SIGSEGV SEGV_MAPERR
addr=0xFB47FB4C
                 
                The services do restart automatically so armonitor is
doing it's job.  We've commented out everything from armonitor but the
arserverd command.
                 
                We stay up for between 2-10 minutes and then wham, we're
down again.  Obviously this just started this morning.
                 
                unix sun solaris 10
                oracle 10g
                ars 7.0.1P2
                 
                They did expand the database size last night if that has
any bearing.  But we can connect to the database successfully when ar is
down.
                 
                Nothing else helpful in arerror.log, only 91 error.
                 
                I'm at the Hardrock hotel, call room 30601 if you have
questions or can help!
                 
                Thanks,
                Susan
                 
                 
                 

                 
                _Platinum Sponsor: rmisoluti...@verizon.net ARSlist:
"Where the Answers Are"_ 
                _Platinum Sponsor: rmisoluti...@verizon.net ARSlist:
"Where the Answers Are"_ 


        _Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the
Answers Are"_ 
        _Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the
Answers Are"_ 


_Platinum Sponsor: rmisoluti...@verizon.net ARSlist: "Where the Answers
Are"_ 



-----------------------------------------
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the individual or entity to whom they are
addressed. If you have received this email in error destroy it
immediately.
**********************************************************************
Wal-Mart Confidential
**********************************************************************


_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
Platinum Sponsor:rmisoluti...@verizon.net ARSlist: "Where the Answers Are"

Reply via email to