Hi,

Thread update:

> *cladm_dbg/s
0xffffff04f55ad008:             th ffffff04f834b580 tm  17107404:
failfastd(385):start
th ffffff04f834b580 tm  17107413: failfastd(385):fork1
th ffffff04f834b580 tm  17107464: failfastd(385):fork1
th ffffff04f834b580 tm  17107465: failfastd(385):done
th ffffff04f8349880 tm  17107509: failfastd(393):fork1
th ffffff04f8351c20 tm  17107541: cl_exec384,1:Main: Default sched class = 1
th ffffff04f8351c20 tm  17107543: cl_exec384,1:Main: starting the
cl_exec service
th ffffff04f8351c20 tm  17107744: cl_exec384,1:Main: wait for daemon to be ready
th ffffff04f8351c20 tm  17107753: cl_exec384,1:Main: cl_exec server
object : <cl_exec.1>
th ffffff04f8349880 tm  17107857: failfastd(393):ready
th ffffff04f8349880 tm  17107866: failfastd(393):synchro file
th ffffff04f8349880 tm  17107869: failfastd(393):write pipe
th ffffff04f834b580 tm  17107870: failfastd(385):read pipe
th ffffff04f834b580 tm  17107871: failfastd(385):exit
th ffffff04ea72be20 tm  17107892: cl_exec394,1:Worker: create daemon process
th ffffff04ea72be20 tm  17107945: cl_exec394,1:Worker: starting
th ffffff04ea72be20 tm  17107950: cl_exec394,1:Worker: create signals thread
th ffffff04f8348a80 tm  17107961: cl_exec394,3:signal thread starting
th ffffff04f8348e00 tm  17108028: cl_exec395,1:Daemon: starting daemon process
th ffffff04f8348e00 tm  17108101: cl_exec395,1:Daemon: create signals thread
th ffffff04eb1e0580 tm  17108110: cl_exec395,3:signal thread starting
th ffffff04f8348e00 tm  17108223: cl_exec395,1:Daemon: bind server
object <cl_exec.1>
th ffffff04f8351c20 tm  17117775: cl_exec384,1:Main: wait for cl_exec obj
th ffffff04f8351c20 tm  17117839: cl_exec384,1:Main: cl_exec obj
resolved in name server
th ffffff04f8351c20 tm  17117841: cl_exec384,1:Main: daemon is ready
th ffffff04f8351c20 tm  17117841: cl_exec384,1:Main: service is online
th ffffff04f833cb00 tm  17122308: clexec405,1:main
th ffffff04f833cb00 tm  17122312: clexec405,1:daemonize
th ffffff04f833cb00 tm  17122363: clexec405,1:daemonize fork
th ffffff04f833cb00 tm  17122363: clexec405,1:wait_for_daemon
th ffffff04f833c780 tm  17122420: clexec406,1:create_process_pair fork1
th ffffff04f833c780 tm  17122464: clexec406,1:daemon_process
th ffffff04f833c400 tm  17122513: clexec407,1:worker_process
th ffffff04f833c780 tm  17122927: clexec406,1:daemon_process ready
th ffffff04f833cb00 tm  17132542: clexec405,1:wait ha_mounter
th ffffff04f833c400 tm  17132593: clexec407,1:wait signal thread
th ffffff04f833cb00 tm  17132603: clexec405,1:nameserver resolved
th ffffff04f833cb00 tm  17132612: clexec405,1:end file created
th ffffff04f833cb00 tm  17132613: clexec405,1:main_end
th ffffff04f9f54760 tm  17698695: cmm_callback_worker:ha_mounter.1
exec /usr/cluster/lib/sc/run_reserve -c reset_shared_bus
th ffffff04f8339c20 tm  17698705:
clexec406,11:execit</usr/cluster/lib/sc/run_reserve -c
reset_shared_bus>
th ffffff04faf808a0 tm  17698729: clexec407,3:worker_thread fork1
</usr/cluster/lib/sc/run_reserve -c reset_shared_bus> fd 3
th ffffff04f9f48740 tm  17698876: clexec682,3:execl
</usr/cluster/lib/sc/run_reserve -c reset_shared_bus>
th ffffff04f833c080 tm  17707441: clexec407,2:catch signal 18 18
si_code 1 si_pid 682 si_uid 3
th ffffff04faf808a0 tm  17707444: clexec407,3:<0> fd 3 retval 0 data.len 6
th ffffff04f8339c20 tm  17707457:
clexec406,11:execit</usr/cluster/lib/sc/run_reserve -c
reset_shared_bus> error 0
th ffffff04f9f54760 tm  17707462: cmm_callback_worker:ha_mounter.1
exec /usr/cluster/lib/sc/run_reserve -c reset_shared_bus excep 0
th ffffff04fa50cc80 tm  17709101:
mount_client_impl::activate:ha_mounter.1 alive 1 except 0
th ffffff04f833c080 tm  17717030: clexec407,2:do_log clexecd: Got an
unexpected signal 18 in process work_process (pid=407, ppid=406)
th ffffff04f833c080 tm  17717055: clexec407,2:do_exit 1
th ffffff04f833b880 tm  17717415: clexec406,3:do_log clexecd: Daemon
exiting because child died.
th ffffff04f833b880 tm  17717437: clexec406,3:do_exit 4


So this command fails on boot:

/usr/cluster/lib/sc/reserve -c reset_shared_bus -h ohac-test-2

Btw, I tried that on b124 and b127.

On Wed, Nov 25, 2009 at 11:53 AM, Piotr Jasiukajtis <estseg at gmail.com> wrote:
> Hi,
>
> I have build OHAC core+agents against ON b127.
>
> Once the cluster is booted 'clexecd' daemon crashes and failfast
> restarts the node.
>
> Oct 10 14:43:10 ohac-test-2 : [ID 719008 daemon.error] clexecd: Got an
> unexpected signal 18 in process work_process (pid=427, ppid=426)
> Oct 10 14:43:10 ohac-test-2 Cluster.Framework: [ID 899305
> daemon.error] clexecd: Daemon exiting because child died.
> Oct 10 14:43:13 ohac-test-2 savecore: [ID 570001 auth.error] reboot
> after panic: Failfast: Aborting zone "global" (zone ID 0) because
> "clexecd" died 30 secon
> ds ago.
>
> Have anyone from Sun tried to build OHAC core against the latest ON
> build? If so, do you have a patch for that or something?
>
> Thanks and regards,
>
> --
> Piotr Jasiukajtis | estibi | SCA OS0072
> http://estseg.blogspot.com
>



-- 
Piotr Jasiukajtis | estibi | SCA OS0072
http://estseg.blogspot.com

Reply via email to