What's in the hs_err_pidxxx.log files? These is usually indicative of a memory access violations or bug in the jvm. Usually with the BMC stuff it is an access violation related to the native libraries. If this is the case, you want to generate a core file to ship to BMC; with that they can isolate the cause of the issue, assuming they have someone available that know how.
The following article provides instructions on generating the windows equivelant of a core file: http://docs.sun.com/app/docs/doc/820-0437/6nc66m9qq?a=view Axton Grams On Fri, Sep 5, 2008 at 1:22 PM, strauss <[EMAIL PROTECTED]> wrote: > We have been seeing the aremail engine hanging every few days > (7.1.00.002 on Win2K3 Ent x64 w java 1.5.0_14 (32-bit) on 12gb RAM x 8 > core HP server) ever since we went live in May. With the fall semester, > we have seen outbound mail traffic go up to 1200-1400 messages a day, > and the crashes have increased in frequency. When the mail service > crashes, it hangs in a state where there are over 1200 handles open, > compared to where it usually sits below 1000 and runs up to 1300-1380 > when actually processing messages. Trying to stop it from the Services > MMC results in a long wait while it tries to do so (it is hung, not > dead), followed by a timeout error; then you can manually restart it. > > Support told us to upgrade the Outlook client from 2003 Sp1 to 2007, > which we needed to do anyway since the mail box underneath it is > switching over from Exchange 2000 to 2007. That resulted in the service > hanging almost hourly during the business day instead of once every day > or two. Most of the time when it hangs there is nothing in the > stderr.log at all, but with Outlook 2007 it writes a 14kb detailed > Exception Access Violation HotSpot Virtual Machine error log > (hs_err_pid####.log) in the \Outlook12 directory (if it was doing this > in the Outlook 11 directory we never saw it). If you log into the > server console after one of these events has happened, you get a > deferred warning that the javaservice failed; clicking on the details > gets you this: > > szAppName : aremaild.exe szAppVer : 1.1.0.0 szModName : > EMSMDB32.DLL > szModVer : 12.0.4518.1014 offset : 000145b4 > > or... > > szAppName : aremaild.exe szAppVer : 1.1.0.0 szModName : > ntdll.dll > szModVer : 5.2.3790.3959 offset : 0005ec97 > > In the event log you usually get: > > 12:49 PM: Faulting application aremaild.exe, version 1.1.0.0, faulting > module ntdll.dll, version 5.2.3790.3959, fault address 0x0005ec97. > 1:00 PM: Reporting queued error: faulting application aremaild.exe, > version 1.1.0.0, faulting module ntdll.dll, version 5.2.3790.3959, fault > address 0x0005ec97. > 1:02 PM: Fault bucket 911867558. > > Last night I updated Outlook 2007 to Sp1 (it is still pointing at a 2000 > server mailbox), and added the line > "External-Authentication-Return-Data-Capabilities: 31" to the ar.cfg and > restarted the server. The email engine had been doing extensive lookups > against our LDAP server for all kinds of notification information that > isn't stored there, that is already in the User form, as evidenced in > the arplugin.log, and since most of the service hangs occur during the > processing of a group notification to 8-10 users in a single group, I > thought that the delay imposed as it tried to fruitlessly look up data > in LDAP might be causing the problem. As it turns out, that was not the > problem - the service has now hung six times in four hours with > hs_err_pid####.log files, and appears to have restarted on its own a > couple-three times, after which it does not always process mail. Several > times when I have restarted the service, it has not even cleared a > backlog of 50-100 messages before hanging again. > > Support has not been much help here, as we have had not one but two > issues open since mid-June (they closed one issue in error and had to > open another), so I wondered if anyone else had run into this severe a > problem and found a solution that worked. I am at the point of writing > batch files and scheduling them to do a net stop (and wait for the hung > service to release) followed by a net start every 15 minutes or so, and > I have had to tell the entire IT staff here this morning that they > cannot count on any of the email notifications to arrive in a timely > manner. > > Why didn't they keep this engine in C, where it was rock solid for > YEARS!??? > > Christopher Strauss, Ph.D. > Call Tracking Administration Manager > University of North Texas Computing & IT Center > http://itsm.unt.edu/ > > > _______________________________________________________________________________ > UNSUBSCRIBE or access ARSlist Archives at www.arslist.org > Platinum Sponsor: www.rmsportal.com ARSlist: "Where the Answers Are" > _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org Platinum Sponsor: www.rmsportal.com ARSlist: "Where the Answers Are"

