Apologies for not detailing the env further (can't ya'll read my mind?) So...
Three environments, Prod, Test, Dev. All single servers, no AR Server groups involved. All are ARS 7.1, CMDB 2.01, ITSM 6.x on ORA 10g Yes, the versions are old and 7.1 is unsupported, but it was working just fine until the weekend of Sep 24. ORA folks swear nothing changed on or about that date. I've mentioned upgrading ARS, BMC Support mentioned it -- Client will not upgrade unless there is no other option. Issue has manifested itself only in Test and Dev env's and the effects have been similar. Prod has not been impacted. Test and Dev databases are both on Oracle RAC env's. I do not think Prod is on RAC, but I could be wrong. I have no idea how RAC is configured -- we don't have direct access to the DB host(s) and the DBA's don't seem very interested in helping. Dev is a VM and was installed from scratch. Test is a VM and was a copy of the Prod DB. TNS config is unknown, but I'm fairly certain we have access to the ARS host, so I will investigate that side of it. Setting affinity to the primary RAC node sounds mandatory to me and I'll inquire about that. I'm also going to see if we can move the DB off the RAC cluster and put it on a stand-alone Oracle host. I don't think they will allow that, but I can ask. Seems like a reasonable way to determine if the issue may be an ARS/RAC issue or not. Troubleshooting - arerror.log reveals nothing more than "Server is unable to load <form, guide, whatever>". BMC Support went straight to the field-count discrepancy and got tunnel-vision on that. To be fair, the client's security severely restricts their troubleshooting avenues. All they could/did do was try to address the symptoms. Otherwise, due to looming deadlines, the client just restored DB backups to repair the symptoms. Obviously because of the recurrences, the DB restore does not resolve the issue. That's about where we are at. I was offsite last week with a different client, but I'm back on-site tomorrow. It will be interesting to see how it's going... JDH On Sun, Oct 16, 2011 at 4:25 AM, John Peto <[email protected]> wrote: > Hi JD, > > I think it's got to be something mis-configured, it's obviously not random > as it's only affecting the system-data as you've described.. > I think for more help you need to explain the config. How's the tns setup? > Is it a server group? Is there a loadbalancer for the db? > How many app servers vs db nodes. If there was a RAC problem then surely > all data would be affected. Surely Updates to the data described only > happen when you make admin changes? I had some crazy thoughts about things > like a dev or test environment somehow hitting the production db via a mis > configured load balanced VIP for the DB (we never used a load balanced db > connection for RAC, the TNS failover gave us much more control) that is > pointing to a prod node. That could cause the problems described. Could > different servers in a server group be enabled for admin somehow (not sure > if you'd get the same problem though)? > > I'd definitely add TNS affinity for the admin server to node 1, and if it's > a server group I'd have all other servers set to NULL in the rankings for > admin. > > It's like the cache is getting messed up - which is understandable if > you've got db problems - but why that gets written back to the db is a > mystery. We don't use any out of the box stuff - so it could ITSM type > stuff that is doing commits to the db. > > You might list all the things you've checked so far? > > Regards, > > JP. > > > _______________________________________________________________________________ > UNSUBSCRIBE or access ARSlist Archives at www.arslist.org > attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are" > _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are"

