Another update: 643/1: zone_lookup(0x00000000) = 0 643/1: zone_lookup(0x00000000) = 0 643/1: zone_lookup(0x00000000) = 0 643/1: lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF] 643/1: door_call(256, 0x08046C80) = 0 643/1: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] 643/1: close(4) = 0 643/1: zone_lookup(0x00000000) = 0 643/1: lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF] 643/1: door_call(259, 0x08046D40) = 0 643/1: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] 643/1: brk(0x081BF608) = 0 643/1: brk(0x081DF608) = 0 643/1: getpid() = 643 [639] 643/1: close(263) = 0 643/1: fxstat(2, -1, 0x08046FF0) Err#9 EBADF 643/1: lseek(1, 0, SEEK_CUR) Err#29 ESPIPE 643/1: lseek(2, 0, SEEK_CUR) = 50933 643/1: lseek(2, 0, SEEK_CUR) = 50974 643/1: lseek(1, 0, SEEK_CUR) Err#29 ESPIPE 643/1: lseek(2, 0, SEEK_CUR) = 51062 643/1: lseek(2, 0, SEEK_CUR) = 51103 643/1: _exit(0)
Full truss output attached. On Thu, Nov 26, 2009 at 10:51 AM, Piotr Jasiukajtis <estseg at gmail.com> wrote: > Hi, > > Thread update: > >> *cladm_dbg/s > 0xffffff04f55ad008: ? ? ? ? ? ? th ffffff04f834b580 tm ?17107404: > failfastd(385):start > th ffffff04f834b580 tm ?17107413: failfastd(385):fork1 > th ffffff04f834b580 tm ?17107464: failfastd(385):fork1 > th ffffff04f834b580 tm ?17107465: failfastd(385):done > th ffffff04f8349880 tm ?17107509: failfastd(393):fork1 > th ffffff04f8351c20 tm ?17107541: cl_exec384,1:Main: Default sched class = 1 > th ffffff04f8351c20 tm ?17107543: cl_exec384,1:Main: starting the > cl_exec service > th ffffff04f8351c20 tm ?17107744: cl_exec384,1:Main: wait for daemon to be > ready > th ffffff04f8351c20 tm ?17107753: cl_exec384,1:Main: cl_exec server > object : <cl_exec.1> > th ffffff04f8349880 tm ?17107857: failfastd(393):ready > th ffffff04f8349880 tm ?17107866: failfastd(393):synchro file > th ffffff04f8349880 tm ?17107869: failfastd(393):write pipe > th ffffff04f834b580 tm ?17107870: failfastd(385):read pipe > th ffffff04f834b580 tm ?17107871: failfastd(385):exit > th ffffff04ea72be20 tm ?17107892: cl_exec394,1:Worker: create daemon process > th ffffff04ea72be20 tm ?17107945: cl_exec394,1:Worker: starting > th ffffff04ea72be20 tm ?17107950: cl_exec394,1:Worker: create signals thread > th ffffff04f8348a80 tm ?17107961: cl_exec394,3:signal thread starting > th ffffff04f8348e00 tm ?17108028: cl_exec395,1:Daemon: starting daemon process > th ffffff04f8348e00 tm ?17108101: cl_exec395,1:Daemon: create signals thread > th ffffff04eb1e0580 tm ?17108110: cl_exec395,3:signal thread starting > th ffffff04f8348e00 tm ?17108223: cl_exec395,1:Daemon: bind server > object <cl_exec.1> > th ffffff04f8351c20 tm ?17117775: cl_exec384,1:Main: wait for cl_exec obj > th ffffff04f8351c20 tm ?17117839: cl_exec384,1:Main: cl_exec obj > resolved in name server > th ffffff04f8351c20 tm ?17117841: cl_exec384,1:Main: daemon is ready > th ffffff04f8351c20 tm ?17117841: cl_exec384,1:Main: service is online > th ffffff04f833cb00 tm ?17122308: clexec405,1:main > th ffffff04f833cb00 tm ?17122312: clexec405,1:daemonize > th ffffff04f833cb00 tm ?17122363: clexec405,1:daemonize fork > th ffffff04f833cb00 tm ?17122363: clexec405,1:wait_for_daemon > th ffffff04f833c780 tm ?17122420: clexec406,1:create_process_pair fork1 > th ffffff04f833c780 tm ?17122464: clexec406,1:daemon_process > th ffffff04f833c400 tm ?17122513: clexec407,1:worker_process > th ffffff04f833c780 tm ?17122927: clexec406,1:daemon_process ready > th ffffff04f833cb00 tm ?17132542: clexec405,1:wait ha_mounter > th ffffff04f833c400 tm ?17132593: clexec407,1:wait signal thread > th ffffff04f833cb00 tm ?17132603: clexec405,1:nameserver resolved > th ffffff04f833cb00 tm ?17132612: clexec405,1:end file created > th ffffff04f833cb00 tm ?17132613: clexec405,1:main_end > th ffffff04f9f54760 tm ?17698695: cmm_callback_worker:ha_mounter.1 > exec /usr/cluster/lib/sc/run_reserve -c reset_shared_bus > th ffffff04f8339c20 tm ?17698705: > clexec406,11:execit</usr/cluster/lib/sc/run_reserve -c > reset_shared_bus> > th ffffff04faf808a0 tm ?17698729: clexec407,3:worker_thread fork1 > </usr/cluster/lib/sc/run_reserve -c reset_shared_bus> fd 3 > th ffffff04f9f48740 tm ?17698876: clexec682,3:execl > </usr/cluster/lib/sc/run_reserve -c reset_shared_bus> > th ffffff04f833c080 tm ?17707441: clexec407,2:catch signal 18 18 > si_code 1 si_pid 682 si_uid 3 > th ffffff04faf808a0 tm ?17707444: clexec407,3:<0> fd 3 retval 0 data.len 6 > th ffffff04f8339c20 tm ?17707457: > clexec406,11:execit</usr/cluster/lib/sc/run_reserve -c > reset_shared_bus> error 0 > th ffffff04f9f54760 tm ?17707462: cmm_callback_worker:ha_mounter.1 > exec /usr/cluster/lib/sc/run_reserve -c reset_shared_bus excep 0 > th ffffff04fa50cc80 tm ?17709101: > mount_client_impl::activate:ha_mounter.1 alive 1 except 0 > th ffffff04f833c080 tm ?17717030: clexec407,2:do_log clexecd: Got an > unexpected signal 18 in process work_process (pid=407, ppid=406) > th ffffff04f833c080 tm ?17717055: clexec407,2:do_exit 1 > th ffffff04f833b880 tm ?17717415: clexec406,3:do_log clexecd: Daemon > exiting because child died. > th ffffff04f833b880 tm ?17717437: clexec406,3:do_exit 4 > > > So this command fails on boot: > > /usr/cluster/lib/sc/reserve -c reset_shared_bus -h ohac-test-2 > > Btw, I tried that on b124 and b127. > > On Wed, Nov 25, 2009 at 11:53 AM, Piotr Jasiukajtis <estseg at gmail.com> > wrote: >> Hi, >> >> I have build OHAC core+agents against ON b127. >> >> Once the cluster is booted 'clexecd' daemon crashes and failfast >> restarts the node. >> >> Oct 10 14:43:10 ohac-test-2 : [ID 719008 daemon.error] clexecd: Got an >> unexpected signal 18 in process work_process (pid=427, ppid=426) >> Oct 10 14:43:10 ohac-test-2 Cluster.Framework: [ID 899305 >> daemon.error] clexecd: Daemon exiting because child died. >> Oct 10 14:43:13 ohac-test-2 savecore: [ID 570001 auth.error] reboot >> after panic: Failfast: Aborting zone "global" (zone ID 0) because >> "clexecd" died 30 secon >> ds ago. >> >> Have anyone from Sun tried to build OHAC core against the latest ON >> build? If so, do you have a patch for that or something? >> >> Thanks and regards, >> >> -- >> Piotr Jasiukajtis | estibi | SCA OS0072 >> http://estseg.blogspot.com >> > > > > -- > Piotr Jasiukajtis | estibi | SCA OS0072 > http://estseg.blogspot.com > -- Piotr Jasiukajtis | estibi | SCA OS0072 http://estseg.blogspot.com -------------- next part -------------- A non-text attachment was scrubbed... Name: truss_reserve.log.gz Type: application/x-gzip Size: 3142 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20091126/d8cb6e92/attachment.bin>