After setting verbose mode for '/usr/cluster/lib/sc/reserve':

Oct 11 15:36:16 ohac-test-2 genunix: [ID 936769 kern.notice] asy1 is
/pci at 0,0/isa at 1f/asy at 1,3f8
Oct 11 15:36:19 ohac-test-2 genunix: [ID 965873 kern.notice] NOTICE:
CMM: Node ohac-test-2 (nodeid = 1) with votecount = 1 added.
Oct 11 15:36:19 ohac-test-2 genunix: [ID 843983 kern.notice] NOTICE:
CMM: Node ohac-test-2: attempting to join cluster.
Oct 11 15:36:19 ohac-test-2 genunix: [ID 525628 kern.notice] NOTICE:
CMM: Cluster has reached quorum.
Oct 11 15:36:19 ohac-test-2 genunix: [ID 377347 kern.notice] NOTICE:
CMM: Node ohac-test-2 (nodeid = 1) is up; new incarnation number =
1255268179.
Oct 11 15:36:19 ohac-test-2 genunix: [ID 108990 kern.notice] NOTICE:
CMM: Cluster members: ohac-test-2.
Oct 11 15:36:19 ohac-test-2 genunix: [ID 279084 kern.notice] NOTICE:
CMM: node reconfiguration #1 completed.
Oct 11 15:36:19 ohac-test-2 genunix: [ID 499756 kern.notice] NOTICE:
CMM: Node ohac-test-2: joined cluster.
Oct 11 15:36:19 ohac-test-2 ip: [ID 856290 kern.notice] ip: joining
multicasts failed (18) on clprivnet0 - will use link layer broadcasts
for multicast
Oct 11 15:36:21 ohac-test-2 Cluster.CCR: [ID 914260 daemon.warning]
Failed to retrieve global fencing status from the global name server
Oct 11 15:36:21 ohac-test-2 last message repeated 1 time
Oct 11 15:36:21 ohac-test-2 Cluster.CCR: [ID 551094 daemon.warning]
reservation warning(node_join) - Unable to open device
/dev/did/rdsk/d3s2, will retry in 2 seconds
Oct 11 15:36:21 ohac-test-2 Cluster.CCR: [ID 551094 daemon.warning]
reservation warning(node_join) - Unable to open device
/dev/did/rdsk/d2s2, will retry in 2 seconds
Oct 11 15:36:22 ohac-test-2 sendmail[588]: [ID 702911 mail.crit] My
unqualified host name (ohac-test-2) unknown; sleeping for retry
Oct 11 15:36:22 ohac-test-2 unix: [ID 469452 kern.notice] NOTICE:
ncrs: 64-bit driver module not found
Oct 11 15:36:23 ohac-test-2 Cluster.CCR: [ID 551094 daemon.warning]
reservation warning(node_join) - Unable to open device
/dev/did/rdsk/d3s2, will retry in 2 seconds
Oct 11 15:36:23 ohac-test-2 Cluster.CCR: [ID 551094 daemon.warning]
reservation warning(node_join) - Unable to open device
/dev/did/rdsk/d2s2, will retry in 2 seconds
Oct 11 15:36:25 ohac-test-2 last message repeated 1 time
Oct 11 15:36:25 ohac-test-2 Cluster.CCR: [ID 551094 daemon.warning]
reservation warning(node_join) - Unable to open device
/dev/did/rdsk/d3s2, will retry in 2 seconds
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 192619 daemon.warning]
reservation error(node_join) - Unable to open device
/dev/did/rdsk/d2s2
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 192619 daemon.warning]
reservation error(node_join) - Unable to open device
/dev/did/rdsk/d3s2
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 317882 daemon.error]
build_devlink_list: readlink failed for /dev/sound/audiohd:0ctl: No
such file or directory
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 317882 daemon.error]
build_devlink_list: readlink failed for /dev/sound/audiohd:0: No such
file or directory
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 317882 daemon.error]
build_devlink_list: readlink failed for /dev/sound/audiohd:0dsp: No
such file or directory
Oct 11 15:36:27 ohac-test-2 last message repeated 1 time
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 317882 daemon.error]
build_devlink_list: readlink failed for /dev/sound/audiohd:0mixer: No
such file or directory
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 317882 daemon.error]
build_devlink_list: readlink failed for /dev/sound/audiohd:0ctl: No
such file or directory
Oct 11 15:36:27 ohac-test-2 Cluster.CCR: [ID 317882 daemon.error]
build_devlink_list: readlink failed for /dev/sound/audiohd:0: No such
file or directory
Oct 11 15:36:28 ohac-test-2 Cluster.Framework: [ID 801593
daemon.notice] stdout: there are 0 dead nodes
Oct 11 15:36:28 ohac-test-2 : [ID 540309 daemon.error] clexecd: Got an
unexpected signal 18 in process work_process (pid=413, ppid=412)
Oct 11 15:36:28 ohac-test-2 Cluster.Framework: [ID 899305
daemon.error] clexecd: Daemon exiting because child died.
Oct 11 15:36:33 ohac-test-2 Cluster.PNM: [ID 745275 daemon.error] PNM
daemon system error: bnxe0 SIOCGXARP failed.: No such device or
address
Oct 11 15:36:33 ohac-test-2 Cluster.scdpmd: [ID 922726 daemon.notice]
The status of device: /dev/did/rdsk/d1s0 is set to MONITORED
Oct 11 15:36:33 ohac-test-2 Cluster.scdpmd: [ID 489913 daemon.notice]
The state of the path to device: /dev/did/rdsk/d1s0 has changed to OK
Oct 11 15:36:33 ohac-test-2 Cluster.scdpmd: [ID 922726 daemon.notice]
The status of device: /dev/did/rdsk/d2s0 is set to MONITORED
Oct 11 15:36:33 ohac-test-2 Cluster.scdpmd: [ID 977412 daemon.notice]
The state of the path to device: /dev/did/rdsk/d2s0 has changed to
FAILED
Oct 11 15:36:33 ohac-test-2 Cluster.scdpmd: [ID 922726 daemon.notice]
The status of device: /dev/did/rdsk/d3s0 is set to MONITORED
Oct 11 15:36:33 ohac-test-2 Cluster.scdpmd: [ID 977412 daemon.notice]
The state of the path to device: /dev/did/rdsk/d3s0 has changed to
FAILED

On Thu, Nov 26, 2009 at 12:20 PM, Piotr Jasiukajtis <estseg at gmail.com> wrote:
> Another update:
>
> 643/1: ? ? ? ? ?zone_lookup(0x00000000) ? ? ? ? ? ? ? ? ? ? ? ? = 0
> 643/1: ? ? ? ? ?zone_lookup(0x00000000) ? ? ? ? ? ? ? ? ? ? ? ? = 0
> 643/1: ? ? ? ? ?zone_lookup(0x00000000) ? ? ? ? ? ? ? ? ? ? ? ? = 0
> 643/1: ? ? ? ? ?lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) =
> 0xFFBFFEFF [0x0000FFFF]
> 643/1: ? ? ? ? ?door_call(256, 0x08046C80) ? ? ? ? ? ? ? ? ? ? ?= 0
> 643/1: ? ? ? ? ?lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) =
> 0xFFBFFEFF [0x0000FFFF]
> 643/1: ? ? ? ? ?close(4) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?= 0
> 643/1: ? ? ? ? ?zone_lookup(0x00000000) ? ? ? ? ? ? ? ? ? ? ? ? = 0
> 643/1: ? ? ? ? ?lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) =
> 0xFFBFFEFF [0x0000FFFF]
> 643/1: ? ? ? ? ?door_call(259, 0x08046D40) ? ? ? ? ? ? ? ? ? ? ?= 0
> 643/1: ? ? ? ? ?lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) =
> 0xFFBFFEFF [0x0000FFFF]
> 643/1: ? ? ? ? ?brk(0x081BF608) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? = 0
> 643/1: ? ? ? ? ?brk(0x081DF608) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? = 0
> 643/1: ? ? ? ? ?getpid() ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?= 643 [639]
> 643/1: ? ? ? ? ?close(263) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?= 0
> 643/1: ? ? ? ? ?fxstat(2, -1, 0x08046FF0) ? ? ? ? ? ? ? ? ? ? ? Err#9 EBADF
> 643/1: ? ? ? ? ?lseek(1, 0, SEEK_CUR) ? ? ? ? ? ? ? ? ? ? ? ? ? Err#29 ESPIPE
> 643/1: ? ? ? ? ?lseek(2, 0, SEEK_CUR) ? ? ? ? ? ? ? ? ? ? ? ? ? = 50933
> 643/1: ? ? ? ? ?lseek(2, 0, SEEK_CUR) ? ? ? ? ? ? ? ? ? ? ? ? ? = 50974
> 643/1: ? ? ? ? ?lseek(1, 0, SEEK_CUR) ? ? ? ? ? ? ? ? ? ? ? ? ? Err#29 ESPIPE
> 643/1: ? ? ? ? ?lseek(2, 0, SEEK_CUR) ? ? ? ? ? ? ? ? ? ? ? ? ? = 51062
> 643/1: ? ? ? ? ?lseek(2, 0, SEEK_CUR) ? ? ? ? ? ? ? ? ? ? ? ? ? = 51103
> 643/1: ? ? ? ? ?_exit(0)
>
> Full truss output attached.
>
> On Thu, Nov 26, 2009 at 10:51 AM, Piotr Jasiukajtis <estseg at gmail.com> 
> wrote:
>> Hi,
>>
>> Thread update:
>>
>>> *cladm_dbg/s
>> 0xffffff04f55ad008: ? ? ? ? ? ? th ffffff04f834b580 tm ?17107404:
>> failfastd(385):start
>> th ffffff04f834b580 tm ?17107413: failfastd(385):fork1
>> th ffffff04f834b580 tm ?17107464: failfastd(385):fork1
>> th ffffff04f834b580 tm ?17107465: failfastd(385):done
>> th ffffff04f8349880 tm ?17107509: failfastd(393):fork1
>> th ffffff04f8351c20 tm ?17107541: cl_exec384,1:Main: Default sched class = 1
>> th ffffff04f8351c20 tm ?17107543: cl_exec384,1:Main: starting the
>> cl_exec service
>> th ffffff04f8351c20 tm ?17107744: cl_exec384,1:Main: wait for daemon to be 
>> ready
>> th ffffff04f8351c20 tm ?17107753: cl_exec384,1:Main: cl_exec server
>> object : <cl_exec.1>
>> th ffffff04f8349880 tm ?17107857: failfastd(393):ready
>> th ffffff04f8349880 tm ?17107866: failfastd(393):synchro file
>> th ffffff04f8349880 tm ?17107869: failfastd(393):write pipe
>> th ffffff04f834b580 tm ?17107870: failfastd(385):read pipe
>> th ffffff04f834b580 tm ?17107871: failfastd(385):exit
>> th ffffff04ea72be20 tm ?17107892: cl_exec394,1:Worker: create daemon process
>> th ffffff04ea72be20 tm ?17107945: cl_exec394,1:Worker: starting
>> th ffffff04ea72be20 tm ?17107950: cl_exec394,1:Worker: create signals thread
>> th ffffff04f8348a80 tm ?17107961: cl_exec394,3:signal thread starting
>> th ffffff04f8348e00 tm ?17108028: cl_exec395,1:Daemon: starting daemon 
>> process
>> th ffffff04f8348e00 tm ?17108101: cl_exec395,1:Daemon: create signals thread
>> th ffffff04eb1e0580 tm ?17108110: cl_exec395,3:signal thread starting
>> th ffffff04f8348e00 tm ?17108223: cl_exec395,1:Daemon: bind server
>> object <cl_exec.1>
>> th ffffff04f8351c20 tm ?17117775: cl_exec384,1:Main: wait for cl_exec obj
>> th ffffff04f8351c20 tm ?17117839: cl_exec384,1:Main: cl_exec obj
>> resolved in name server
>> th ffffff04f8351c20 tm ?17117841: cl_exec384,1:Main: daemon is ready
>> th ffffff04f8351c20 tm ?17117841: cl_exec384,1:Main: service is online
>> th ffffff04f833cb00 tm ?17122308: clexec405,1:main
>> th ffffff04f833cb00 tm ?17122312: clexec405,1:daemonize
>> th ffffff04f833cb00 tm ?17122363: clexec405,1:daemonize fork
>> th ffffff04f833cb00 tm ?17122363: clexec405,1:wait_for_daemon
>> th ffffff04f833c780 tm ?17122420: clexec406,1:create_process_pair fork1
>> th ffffff04f833c780 tm ?17122464: clexec406,1:daemon_process
>> th ffffff04f833c400 tm ?17122513: clexec407,1:worker_process
>> th ffffff04f833c780 tm ?17122927: clexec406,1:daemon_process ready
>> th ffffff04f833cb00 tm ?17132542: clexec405,1:wait ha_mounter
>> th ffffff04f833c400 tm ?17132593: clexec407,1:wait signal thread
>> th ffffff04f833cb00 tm ?17132603: clexec405,1:nameserver resolved
>> th ffffff04f833cb00 tm ?17132612: clexec405,1:end file created
>> th ffffff04f833cb00 tm ?17132613: clexec405,1:main_end
>> th ffffff04f9f54760 tm ?17698695: cmm_callback_worker:ha_mounter.1
>> exec /usr/cluster/lib/sc/run_reserve -c reset_shared_bus
>> th ffffff04f8339c20 tm ?17698705:
>> clexec406,11:execit</usr/cluster/lib/sc/run_reserve -c
>> reset_shared_bus>
>> th ffffff04faf808a0 tm ?17698729: clexec407,3:worker_thread fork1
>> </usr/cluster/lib/sc/run_reserve -c reset_shared_bus> fd 3
>> th ffffff04f9f48740 tm ?17698876: clexec682,3:execl
>> </usr/cluster/lib/sc/run_reserve -c reset_shared_bus>
>> th ffffff04f833c080 tm ?17707441: clexec407,2:catch signal 18 18
>> si_code 1 si_pid 682 si_uid 3
>> th ffffff04faf808a0 tm ?17707444: clexec407,3:<0> fd 3 retval 0 data.len 6
>> th ffffff04f8339c20 tm ?17707457:
>> clexec406,11:execit</usr/cluster/lib/sc/run_reserve -c
>> reset_shared_bus> error 0
>> th ffffff04f9f54760 tm ?17707462: cmm_callback_worker:ha_mounter.1
>> exec /usr/cluster/lib/sc/run_reserve -c reset_shared_bus excep 0
>> th ffffff04fa50cc80 tm ?17709101:
>> mount_client_impl::activate:ha_mounter.1 alive 1 except 0
>> th ffffff04f833c080 tm ?17717030: clexec407,2:do_log clexecd: Got an
>> unexpected signal 18 in process work_process (pid=407, ppid=406)
>> th ffffff04f833c080 tm ?17717055: clexec407,2:do_exit 1
>> th ffffff04f833b880 tm ?17717415: clexec406,3:do_log clexecd: Daemon
>> exiting because child died.
>> th ffffff04f833b880 tm ?17717437: clexec406,3:do_exit 4
>>
>>
>> So this command fails on boot:
>>
>> /usr/cluster/lib/sc/reserve -c reset_shared_bus -h ohac-test-2
>>
>> Btw, I tried that on b124 and b127.
>>
>> On Wed, Nov 25, 2009 at 11:53 AM, Piotr Jasiukajtis <estseg at gmail.com> 
>> wrote:
>>> Hi,
>>>
>>> I have build OHAC core+agents against ON b127.
>>>
>>> Once the cluster is booted 'clexecd' daemon crashes and failfast
>>> restarts the node.
>>>
>>> Oct 10 14:43:10 ohac-test-2 : [ID 719008 daemon.error] clexecd: Got an
>>> unexpected signal 18 in process work_process (pid=427, ppid=426)
>>> Oct 10 14:43:10 ohac-test-2 Cluster.Framework: [ID 899305
>>> daemon.error] clexecd: Daemon exiting because child died.
>>> Oct 10 14:43:13 ohac-test-2 savecore: [ID 570001 auth.error] reboot
>>> after panic: Failfast: Aborting zone "global" (zone ID 0) because
>>> "clexecd" died 30 secon
>>> ds ago.
>>>
>>> Have anyone from Sun tried to build OHAC core against the latest ON
>>> build? If so, do you have a patch for that or something?
>>>
>>> Thanks and regards,
>>>
>>> --
>>> Piotr Jasiukajtis | estibi | SCA OS0072
>>> http://estseg.blogspot.com
>>>
>>
>>
>>
>> --
>> Piotr Jasiukajtis | estibi | SCA OS0072
>> http://estseg.blogspot.com
>>
>
>
>
> --
> Piotr Jasiukajtis | estibi | SCA OS0072
> http://estseg.blogspot.com
>



-- 
Piotr Jasiukajtis | estibi | SCA OS0072
http://estseg.blogspot.com

Reply via email to