Hi,
I'm encountering problem when starting the "local" example (one
MSD, LOV, OST, and client, all on node "sun-n1-console").
# lmc -m test.xml --batch test.txt
# cat test.txt
--add node --node sun-n1-console
--add net --node sun-n1-console --nettype lnet --nid [EMAIL PROTECTED]
--add mds --node sun-n1-console --mds mds1 --fstype ldiskfs --dev
/tmp/mds1-sun-n1-console --size 400000
--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
--stripe_pattern 0
--add ost --node sun-n1-console --lov lov1 --ost ost1-sun-n1-console --fstype
ldiskfs --dev /tmp/ost1-sun-n1-console --size 400000
--add mtpt --node sun-n1-console --path /mnt/lustre --mds mds1 --lov lov1
The node has two ethernets, eth0 and eth1, both on separate subnets.
I deploys all lustre components on eth1 (IP: 192.168.123.45, hostname:
sun-n1-console).
# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
xxx.yyy.zzz.ab public-host
192.168.123.45 sun-n1-console
When eth0 is down, I successfully deployed the "local" example.
Only when eth0 is up that Lustre fails to start (see attachment)
The error messages from /var/log/messages indicates that MDS does
not respond (see below). I believe it's not caused by firewall cause
I've switched it off:
# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
And here're are the error messages:
# tail /var/log/messages
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 [EMAIL
PROTECTED] x22/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177061855, 0s ago) [EMAIL PROTECTED] x22/t0 o8->[EMAIL PROTECTED]:6 lens
240/272 ref 1 fl Rpc:/0/0 rc 0/0
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
Apr 20 17:38:00 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 [EMAIL
PROTECTED] x23/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0
Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.683:64): avc: denied
{ rawip_recv } for pid=6537 comm="socknal_cd03" saddr=192.168.123.45 src=1023
daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.884:65): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:27 sun-n1-console kernel: audit(1177061907.090:67): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:28 sun-n1-console kernel: audit(1177061908.698:68): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6539:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request
from 192.168.123.45
Apr 20 17:38:30 sun-n1-console kernel: audit(1177061910.683:69): avc: denied
{ rawip_send } for pid=6539 comm="acceptor_988" saddr=192.168.123.45 src=988
daddr=192.168.123.45 dest=1023 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6537:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.123.45
Apr 20 17:38:30 sun-n1-console kernel: LustreError: Connection to [EMAIL
PROTECTED] at host 192.168.123.45 on port 988 was reset: is it running a
compatible version of Lustre and is [EMAIL PROTECTED] one of its NIDs?
Apr 20 17:38:50 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 [EMAIL
PROTECTED] x25/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0
Apr 20 17:39:15 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 [EMAIL
PROTECTED] x26/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0
Any advices how to make this simple example work?
Regards,
Verdi
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
[EMAIL PROTECTED] tmp]# lconf --reformat --verbose hoho.xml
configuring for host: ['sun-n1-console']
setting /proc/sys/net/core/rmem_max to at least 16777216
setting /proc/sys/net/core/wmem_max to at least 16777216
Service: network NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID
loading module: libcfs srcdir None devdir libcfs
+ /sbin/modprobe libcfs
loading module: lnet srcdir None devdir lnet
+ /sbin/modprobe lnet
+ /sbin/modprobe lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
+ /sbin/modprobe ksocklnd
Service: ldlm ldlm ldlm_UUID
loading module: lvfs srcdir None devdir lvfs
+ /sbin/modprobe lvfs
loading module: obdclass srcdir None devdir obdclass
+ /sbin/modprobe obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
+ /sbin/modprobe ptlrpc
Service: osd OSD_ost1-sun-n1-console_sun-n1-console
-n1-console_sun-n1-console_UUID
loading module: ost srcdir None devdir ost
+ /sbin/modprobe ost
loading module: ldiskfs srcdir None devdir ldiskfs
+ /sbin/modprobe ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
+ /sbin/modprobe fsfilt_ldiskfs
loading module: obdfilter srcdir None devdir obdfilter
+ /sbin/modprobe obdfilter
Service: mdsdev MDD_mds1_sun-n1-console MDD_mds1_sun-n1-console_UUID
original inode_size 0
stripe_count 1 inode_size 512
loading module: mdc srcdir None devdir mdc
+ /sbin/modprobe mdc
loading module: osc srcdir None devdir osc
+ /sbin/modprobe osc
loading module: lov srcdir None devdir lov
+ /sbin/modprobe lov
loading module: mds srcdir None devdir mds
+ /sbin/modprobe mds
Service: mountpoint MNT_sun-n1-console MNT_sun-n1-console_UUID
get_lov_tgts failed, using get_refs
dbg LOV __init__: [(<__main__.OSC instance at 0xb7cd952c>, 0, 1, 1)]
[u'ost1-sun-n1-console_UUID'] 1
loading module: llite srcdir None devdir llite
+ /sbin/modprobe llite
+ sysctl lnet/debug_path /tmp/lustre-log-sun-n1-console
+ /usr/sbin/lctl modules > /tmp/ogdb-sun-n1-console
Service: network NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID
NETWORK: NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID lnet [EMAIL
PROTECTED]
Service: ldlm ldlm ldlm_UUID
Service: osd OSD_ost1-sun-n1-console_sun-n1-console
-n1-console_sun-n1-console_UUID
OSD: ost1-sun-n1-console ost1-sun-n1-console_UUID obdfilter
/tmp/ost1-sun-n1-console 400000 ldiskfs no 0 256
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ dd if=/dev/zero bs=1k count=0 seek=400000 of=/tmp/ost1-sun-n1-console
+ mkfs.ext2 -j -b 4096 -F -I 256 /tmp/ost1-sun-n1-console 100000
+ tune2fs -O dir_index /tmp/ost1-sun-n1-console
+ losetup /dev/loop0
+ losetup /dev/loop0 /tmp/ost1-sun-n1-console
+ dumpe2fs -f -h /dev/loop0
no external journal found for /dev/loop0
OST mount options: errors=remount-ro
+ /usr/sbin/lctl
attach obdfilter ost1-sun-n1-console ost1-sun-n1-console_UUID
quit
+ /usr/sbin/lctl
cfg_device ost1-sun-n1-console
setup /dev/loop0 ldiskfs f errors=remount-ro
quit
+ /usr/sbin/lctl
attach ost OSS OSS_UUID
quit
+ /usr/sbin/lctl
cfg_device OSS
setup
quit
Service: mdsdev MDD_mds1_sun-n1-console MDD_mds1_sun-n1-console_UUID
original inode_size 0
stripe_count 1 inode_size 512
MDSDEV: mds1 mds1_UUID /tmp/mds1-sun-n1-console ldiskfs no
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ dd if=/dev/zero bs=1k count=0 seek=400000 of=/tmp/mds1-sun-n1-console
+ mkfs.ext2 -j -b 4096 -F -i 4096 -I 512 /tmp/mds1-sun-n1-console 100000
+ tune2fs -O dir_index /tmp/mds1-sun-n1-console
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop1 /tmp/mds1-sun-n1-console
+ /usr/sbin/lctl
attach mds mds1 mds1_UUID
quit
+ /usr/sbin/lctl
cfg_device mds1
setup /dev/loop1 ldiskfs
quit
recording clients for filesystem: FS_fsname_UUID
get_lov_tgts failed, using get_refs
dbg LOV __init__: [(<__main__.OSC instance at 0xb7cd988c>, 0, 1, 1)]
[u'ost1-sun-n1-console_UUID'] 1
+ /usr/sbin/lctl
device $mds1
probe
clear_log mds1
quit
Recording log mds1 on mds1
dbg LOV prepare
dbg LOV prepare: [(<__main__.OSC instance at 0xb7cd988c>, 0, 1, 1)]
[u'ost1-sun-n1-console_UUID']
LOV: lov_mds1 4300b_lov_mds1_fe6fd41018 mds1_UUID 1 1048576 0 0
[u'ost1-sun-n1-console_UUID'] mds1
+ /usr/sbin/lctl
device $mds1
record mds1
attach lov lov_mds1 4300b_lov_mds1_fe6fd41018
lov_setup lov1_UUID 1 1048576 0 0
quit
OSC: OSC_sun-n1-console_ost1-sun-n1-console_mds1 4300b_lov_mds1_fe6fd41018
ost1-sun-n1-console_UUID
dbg CLIENT __prepare__: ost1-sun-n1-console_UUID [<__main__.Network instance at
0xb7cd9c6c>]
+ /usr/sbin/lctl
device $mds1
record mds1
add_uuid sun-n1-console_UUID [EMAIL PROTECTED]
ost1-sun-n1-console_UUID active
+ /usr/sbin/lctl
device $mds1
record mds1
attach osc OSC_sun-n1-console_ost1-sun-n1-console_mds1
4300b_lov_mds1_fe6fd41018
quit
+ /usr/sbin/lctl
device $mds1
record mds1
cfg_device OSC_sun-n1-console_ost1-sun-n1-console_mds1
setup ost1-sun-n1-console_UUID sun-n1-console_UUID
quit
+ /usr/sbin/lctl
device $mds1
record mds1
cfg_device lov_mds1
lov_modify_tgts add lov_mds1 ost1-sun-n1-console_UUID 0 1
quit
+ /usr/sbin/lctl
device $mds1
record mds1
mount_option mds1 lov_mds1
quit
End recording log mds1 on mds1
Recording log sun-n1-console on mds1
+ /usr/sbin/lconf -v --record --nomod --old_conf --record_log sun-n1-console
--record_device mds1 --node sun-n1-console hoho.xml
record> configuring for host: ['sun-n1-console']
record> Checking XML modification time
record> + debugfs -c -R 'stat /LOGS' /tmp/mds1-sun-n1-console 2>&1 | grep mtime
record> Can not get mtime info of MDS LOGS directory
record> + /usr/sbin/lctl
record> device $mds1
record> probe
record> clear_log sun-n1-console
record> quit
record> Recording log sun-n1-console on mds1
record> Service: network NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID
record> Service: ldlm ldlm ldlm_UUID
record> Service: osd OSD_ost1-sun-n1-console_sun-n1-console
-n1-console_sun-n1-console_UUID
record> Service: mdsdev MDD_mds1_sun-n1-console MDD_mds1_sun-n1-console_UUID
record> original inode_size 0
record> stripe_count 1 inode_size 512
record> Service: mountpoint MNT_sun-n1-console MNT_sun-n1-console_UUID
record> get_lov_tgts failed, using get_refs
record> dbg LOV __init__: [(<__main__.OSC instance at 0xb7cf64cc>, 0, 1, 1)]
[u'ost1-sun-n1-console_UUID'] 1
record> dbg LOV prepare
record> dbg LOV prepare: [(<__main__.OSC instance at 0xb7cf64cc>, 0, 1, 1)]
[u'ost1-sun-n1-console_UUID']
record> LOV: lov1 028ec_lov1_fa9d4fa5b7 mds1_UUID 1 1048576 0 0
[u'ost1-sun-n1-console_UUID'] mds1
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> attach lov lov1 028ec_lov1_fa9d4fa5b7
record> lov_setup lov1_UUID 1 1048576 0 0
record> quit
record> OSC: OSC_sun-n1-console_ost1-sun-n1-console_MNT_sun-n1-console
028ec_lov1_fa9d4fa5b7 ost1-sun-n1-console_UUID
record> dbg CLIENT __prepare__: ost1-sun-n1-console_UUID [<__main__.Network
instance at 0xb7cf66cc>]
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> add_uuid sun-n1-console_UUID [EMAIL PROTECTED]
record> ost1-sun-n1-console_UUID active
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> attach osc OSC_sun-n1-console_ost1-sun-n1-console_MNT_sun-n1-console
028ec_lov1_fa9d4fa5b7
record> quit
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> cfg_device OSC_sun-n1-console_ost1-sun-n1-console_MNT_sun-n1-console
record> setup ost1-sun-n1-console_UUID sun-n1-console_UUID
record> quit
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> cfg_device lov1
record> lov_modify_tgts add lov1 ost1-sun-n1-console_UUID 0 1
record> quit
record> MDC: MDC_sun-n1-console_mds1_MNT_sun-n1-console
0cf7b_MNT_sun-n1-console_dd8b963906 mds1_UUID
record> dbg CLIENT __prepare__: mds1_UUID [<__main__.Network instance at
0xb7cf6a4c>]
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> add_uuid sun-n1-console_UUID [EMAIL PROTECTED]
record> mds1_UUID active
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> attach mdc MDC_sun-n1-console_mds1_MNT_sun-n1-console
0cf7b_MNT_sun-n1-console_dd8b963906
record> quit
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> cfg_device MDC_sun-n1-console_mds1_MNT_sun-n1-console
record> setup mds1_UUID sun-n1-console_UUID
record> quit
record> MTPT: MNT_sun-n1-console MNT_sun-n1-console_UUID /mnt/lustre mds1_UUID
lov1_UUID
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> mount_option sun-n1-console lov1
MDC_sun-n1-console_mds1_MNT_sun-n1-console
record> quit
record> End recording log sun-n1-console on mds1
+ /usr/sbin/lctl
ignore_errors
cfg_device $mds1
cleanup
detach
quit
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup -d /dev/loop1
changing mtime of LOGS to 1177060884
+ mktemp /tmp/lustre-cmd.XXXXXXXX
+ debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.mEPL5082 /tmp/mds1-sun-n1-console
MDSDEV: mds1 mds1_UUID /tmp/mds1-sun-n1-console ldiskfs 400000 no
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop1 /tmp/mds1-sun-n1-console
+ /usr/sbin/lctl
attach mdt MDT MDT_UUID
quit
+ /usr/sbin/lctl
cfg_device MDT
setup
quit
+ dumpe2fs -f -h /dev/loop1
no external journal found for /dev/loop1
MDS mount options: errors=remount-ro
+ /usr/sbin/lctl
attach mds mds1 mds1_UUID
quit
+ /usr/sbin/lctl
cfg_device mds1
setup /dev/loop1 ldiskfs mds1 errors=remount-ro
quit
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss