Stacy, Failfast is enabled on pmfd, so that if the daemon dies, it will panic the node. This makes sense in a multi-node cluster, but not so much on a single-node cluster. You can temporarily disable failfast on your single node until the next reboot, by running the following command:
/usr/cluster/lib/sc/cmm_ctl -f pmfd should have dropped a core file when it died. Execute 'coreadm' to get a clue where the core file might be located. For further diagnosis, I would want to see the output produced by executing 'pstack' on that core file. --Marty On 02/ 4/10 01:16 PM, Hartmut Streppel wrote: > Hi Stacy, > the log file shows that prior to pmfd disappearing, your acsls-rg had > severe problems. Although it is not directly obvious that this has > caused pmfd to die, I would first diagnose the RG problem. What is the > Failover_mode property of acsls-rg set to? > > Regards Hartmut > > Stacy Maydew schrieb: >> Hi all, >> >> Running Opensolaris 2009.06 and OCHA 2009.06 on an x64 machine. >> >> We're trying to setup and test a single-node cluster and during the >> tests that online/offline the services under cluster control, the >> system occasionally panics unexpectedly. Any insights would be >> greatly appreciated. >> >> The following error message is generated: >> >> 656416 libsecurity, door_call: Fatal, the server is not available. >> >> *Description: * >> >> The client (libpmf/libfe/libscha) is trying to communicate with the >> server (rpc.pmfd/rpc.fed/rgmd) but is failing because the server >> might be down. >> >> *Solution: * >> >> Save the /var/adm/messages files on each node. Contact your >> authorized Sun service provider to determine whether a workaround or >> patch is available. >> >> ---------------------------------------------------------------------------------------------- >> >> >> The following error messages appear at the time of the panic in >> /var/adm/messages: >> >> Feb 2 10:54:07 vdev30ga Cluster.RGM.global.rgmd: [ID 424774 >> daemon.error] Resource group <acsls-rg> requires operator attention >> due to STOP failure >> Feb 2 10:54:30 vdev30ga unix: [ID 836849 kern.notice] >> Feb 2 10:54:30 vdev30ga ^Mpanic[cpu0]/thread=ffffff001e8bfc60: >> Feb 2 10:54:30 vdev30ga genunix: [ID 562397 kern.notice] Failfast: >> Aborting zone "global" (zone ID 0) because "pmfd" died 35 seconds ago. >> Feb 2 10:54:30 vdev30ga unix: [ID 100000 kern.notice] >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bf8c0 genunix:vcmn_err+2c () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bf8d0 >> cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+1f >> >> () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bf9b0 >> cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+8c >> >> () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bf9e0 cl_haci:__1cHff_implPstop_node_panic6M_v_+3b4 () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfa00 cl_haci:__1cHff_implNunit_timedout6M_v_+53 () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfa20 cl_haci:__1cLff_timedout6Fpc_v_+11 () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfa70 >> cl_haci:__1cQff_callout_tableTper_tick_processing6F_v_+c7 () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfaa0 >> cl_haci:__1cNff_admin_implWsc_per_tick_processing6Mn0AQcallout_caller_t__v_+83 >> >> () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfab0 cl_haci:__1cNff_admin_implQff_clock_callout6F_v_+12 () >> Feb 2 10:54:30 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfb10 genunix:clock+346 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfbc0 genunix:cyclic_softint+dc () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfbd0 unix:cbe_softclock+1a () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfc10 unix:av_dispatch_softvect+5f () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e8bfc40 unix:dispatch_softint+34 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805a60 unix:switch_sp_and_call+13 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805a90 unix:dosoftint+59 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805ae0 unix:do_interrupt+fc () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805af0 unix:cmnint+ba () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805be0 unix:mach_cpu_idle+b () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805c10 unix:cpu_idle+c0 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805c20 unix:cpu_idle_adaptive+19 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805c40 unix:idle+114 () >> Feb 2 10:54:31 vdev30ga genunix: [ID 655072 kern.notice] >> ffffff001e805c50 unix:thread_start+8 () >> Feb 2 10:54:31 vdev30ga unix: [ID 100000 kern.notice] >> Feb 2 10:54:31 vdev30ga genunix: [ID 672855 kern.notice] syncing >> file systems... >> Feb 2 10:54:31 vdev30ga genunix: [ID 904073 kern.notice] done >> >