We have been seeing the aremail engine hanging every few days (7.1.00.002 on Win2K3 Ent x64 w java 1.5.0_14 (32-bit) on 12gb RAM x 8 core HP server) ever since we went live in May. With the fall semester, we have seen outbound mail traffic go up to 1200-1400 messages a day, and the crashes have increased in frequency. When the mail service crashes, it hangs in a state where there are over 1200 handles open, compared to where it usually sits below 1000 and runs up to 1300-1380 when actually processing messages. Trying to stop it from the Services MMC results in a long wait while it tries to do so (it is hung, not dead), followed by a timeout error; then you can manually restart it.
Support told us to upgrade the Outlook client from 2003 Sp1 to 2007, which we needed to do anyway since the mail box underneath it is switching over from Exchange 2000 to 2007. That resulted in the service hanging almost hourly during the business day instead of once every day or two. Most of the time when it hangs there is nothing in the stderr.log at all, but with Outlook 2007 it writes a 14kb detailed Exception Access Violation HotSpot Virtual Machine error log (hs_err_pid####.log) in the \Outlook12 directory (if it was doing this in the Outlook 11 directory we never saw it). If you log into the server console after one of these events has happened, you get a deferred warning that the javaservice failed; clicking on the details gets you this: szAppName : aremaild.exe szAppVer : 1.1.0.0 szModName : EMSMDB32.DLL szModVer : 12.0.4518.1014 offset : 000145b4 or... szAppName : aremaild.exe szAppVer : 1.1.0.0 szModName : ntdll.dll szModVer : 5.2.3790.3959 offset : 0005ec97 In the event log you usually get: 12:49 PM: Faulting application aremaild.exe, version 1.1.0.0, faulting module ntdll.dll, version 5.2.3790.3959, fault address 0x0005ec97. 1:00 PM: Reporting queued error: faulting application aremaild.exe, version 1.1.0.0, faulting module ntdll.dll, version 5.2.3790.3959, fault address 0x0005ec97. 1:02 PM: Fault bucket 911867558. Last night I updated Outlook 2007 to Sp1 (it is still pointing at a 2000 server mailbox), and added the line "External-Authentication-Return-Data-Capabilities: 31" to the ar.cfg and restarted the server. The email engine had been doing extensive lookups against our LDAP server for all kinds of notification information that isn't stored there, that is already in the User form, as evidenced in the arplugin.log, and since most of the service hangs occur during the processing of a group notification to 8-10 users in a single group, I thought that the delay imposed as it tried to fruitlessly look up data in LDAP might be causing the problem. As it turns out, that was not the problem - the service has now hung six times in four hours with hs_err_pid####.log files, and appears to have restarted on its own a couple-three times, after which it does not always process mail. Several times when I have restarted the service, it has not even cleared a backlog of 50-100 messages before hanging again. Support has not been much help here, as we have had not one but two issues open since mid-June (they closed one issue in error and had to open another), so I wondered if anyone else had run into this severe a problem and found a solution that worked. I am at the point of writing batch files and scheduling them to do a net stop (and wait for the hung service to release) followed by a net start every 15 minutes or so, and I have had to tell the entire IT staff here this morning that they cannot count on any of the email notifications to arrive in a timely manner. Why didn't they keep this engine in C, where it was rock solid for YEARS!??? Christopher Strauss, Ph.D. Call Tracking Administration Manager University of North Texas Computing & IT Center http://itsm.unt.edu/ _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org Platinum Sponsor: www.rmsportal.com ARSlist: "Where the Answers Are"

