Andreas,
I made the changes Nathan suggested to how the networking was set up in
config.sh. I also checked the LUNs and you were correct: sda on
lustre1 is sdb on lustre2 and vice versa. So I also changed config.sh
to use sda1 for both.
However, I still get the exact same error when I try to mount the client
(and yes, it's still the ENODEV, but why?):
[EMAIL PROTECTED] ~]# mount -v -t lustre lustrem:/mds-test/client /mnt/lustre
verbose: 1
arg[0] = /sbin/mount.lustre
arg[1] = lustrem:/mds-test/client
arg[2] = /mnt/lustre
arg[3] = -v
arg[4] = -o
arg[5] = rw
mds nid 0: [EMAIL PROTECTED]
mds name: mds-test
profile: client
options: rw
retry: 0
mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed:
Input/output error
mds nid 0: [EMAIL PROTECTED]
mds name: mds-test
profile: client
options: rw
retry: 0
[EMAIL PROTECTED] ~]#
MDS (lustre1) /var/log/messages:
Feb 2 16:17:18 lustrem kernel: Lustre: OBD class driver Build Version:
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
[EMAIL PROTECTED]
Feb 2 16:17:19 lustrem kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/256]
Feb 2 16:17:19 lustrem kernel: Lustre: Accept secure, port 988
Feb 2 16:17:19 lustrem kernel: loop: loaded (max 8 devices)
Feb 2 16:17:21 lustrem kernel: kjournald starting. Commit interval 5
seconds
Feb 2 16:17:21 lustrem kernel: LDISKFS FS on loop0, internal journal
Feb 2 16:17:21 lustrem kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Feb 2 16:17:21 lustrem kernel: Lustre:
3518:0:(mds_fs.c:239:mds_init_server_data()) mds-test: initializing new
last_rcvd
Feb 2 16:17:21 lustrem kernel: Lustre: MDT mds-test now serving
/dev/loop0 (b505d8f0-d424-4bf8-a8cd-8bfa8af0cf36) with recovery enabled
Feb 2 16:17:21 lustrem kernel: Lustre: MDT mds-test has stopped.
Feb 2 16:17:22 lustrem kernel: kjournald starting. Commit interval 5
seconds
Feb 2 16:17:22 lustrem kernel: LDISKFS FS on loop0, internal journal
Feb 2 16:17:22 lustrem kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Feb 2 16:17:22 lustrem kernel: Lustre: Binding irq 185 to CPU 0 with
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb 2 16:17:27 lustrem kernel: LustreError:
3882:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170454642, 5s ago) [EMAIL PROTECTED] x1/t0
o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb 2 16:17:28 lustrem kernel: LustreError:
3680:0:(ldlm_lib.c:541:target_handle_connect()) @@@ UUID 'mds-test' is
not available for connect (not set up) [EMAIL PROTECTED] x27/t0
o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc 0/0
Feb 2 16:17:28 lustrem kernel: LustreError:
3680:0:(ldlm_lib.c:1288:target_send_reply_msg()) @@@ processing error
(-19) [EMAIL PROTECTED] x27/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl
Interpret:/0/0 rc -19/0
OST (lustre1) /var/log/messages:
Feb 2 16:16:30 lustre1 kernel: Lustre: OBD class driver Build Version:
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
[EMAIL PROTECTED]
Feb 2 16:16:30 lustre1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/256]
Feb 2 16:16:30 lustre1 kernel: Lustre: Accept secure, port 988
Feb 2 16:16:31 lustre1 kernel: Lustre: Filtering OBD driver;
[EMAIL PROTECTED]
Feb 2 16:17:00 lustre1 kernel: Lustre: Binding irq 185 to CPU 0 with
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb 2 16:17:00 lustre1 kernel: Lustre:
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 1 offset 0 length 240: 2
Feb 2 16:17:25 lustre1 kernel: Lustre:
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 4 offset 0 length 240: 2
Feb 2 16:17:50 lustre1 kernel: Lustre:
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 6 offset 0 length 240: 2
Feb 2 16:18:15 lustre1 kernel: Lustre:
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 8 offset 0 length 240: 2
Feb 2 16:18:40 lustre1 kernel: Lustre:
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 10 offset 0 length 240: 2
OST (lustre2) /var/log/messages:
Feb 2 16:16:28 lustre2 kernel: Lustre: OBD class driver Build Version:
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
[EMAIL PROTECTED]
Feb 2 16:16:28 lustre2 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/256]
Feb 2 16:16:28 lustre2 kernel: Lustre: Accept secure, port 988
Feb 2 16:16:28 lustre2 kernel: Lustre: Filtering OBD driver;
[EMAIL PROTECTED]
Feb 2 16:16:53 lustre2 kernel: Lustre: Binding irq 185 to CPU 0 with
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb 2 16:16:53 lustre2 kernel: Lustre:
3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 2 offset 0 length 240: 2
Feb 2 16:17:18 lustre2 kernel: Lustre:
3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 5 offset 0 length 240: 2
Feb 2 16:17:43 lustre2 kernel: Lustre:
3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 6 match 7 offset 0 length 240: 2
Client (scnode01) /var/log/messages:
Feb 2 16:17:10 scnode01 kernel: LustreError:
19745:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x27/t0
o38->[EMAIL PROTECTED]@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19
Feb 2 16:17:10 scnode01 kernel: LustreError: mdc_dev: The configuration
'client' could not be read from the MDS 'mds-test'. This may be the
result of communication errors between the client and the MDS, or if the
MDS is not running.
Feb 2 16:17:10 scnode01 kernel: LustreError:
19742:0:(llite_lib.c:936:lustre_fill_super()) Unable to process log: client
config.sh:
#!/bin/sh
# config.sh
#
rm -f config.xml
#
# Create nodes
# Trying to get this to work with 1 MDS, 2 OST's, and 1 client. Will
add the
# others when I get this working. - klb, 2/2/07
#
lmc -m config.xml --add node --node lustrem
lmc -m config.xml --add node --node lustre1
lmc -m config.xml --add node --node lustre2
lmc -m config.xml --add node --node client
#
# Configure networking
#
lmc -m config.xml --add net --node lustrem --nid [EMAIL PROTECTED]
--nettype lnet
lmc -m config.xml --add net --node lustre1 --nid [EMAIL PROTECTED]
--nettype lnet
lmc -m config.xml --add net --node lustre2 --nid [EMAIL PROTECTED]
--nettype lnet
lmc -m config.xml --add net --node client --nid '*' --nettype lnet
#lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
#lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
#lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp
#lmc -m config.xml --add net --node client --nid '*' --nettype tcp
#
# Configure MDS
#
lmc -m config.xml --add mds --node lustrem --mds mds-test --fstype
ldiskfs --dev /tmp/mds-test --size 50000
#
# Configure OSTs - testing with 2 initially - klb, 2/1/2007
#
lmc -m config.xml --add lov --lov lov-test --mds mds-test --stripe_sz
1048576 --stripe_cnt 0 --stripe_pattern 0
lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost
ost1-test --fstype ldiskfs --dev /dev/sda1
lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost
ost2-test --fstype ldiskfs --dev /dev/sda1
#
# Configure client (this is a 'generic' client used for all client mounts)
# testing with 1 client initially - klb, 2/1/2007
#
lmc -m config.xml --add mtpt --node client --path /mnt/lustre --mds
mds-test --lov lov-test
#
# Copy config.xml to all the other nodes in the cluster - klb, 2/1/07
#
for i in `seq 1 4`
do
echo "Copying config.xml to OST lustre$i..."
rcp -p config.xml [EMAIL PROTECTED]:~/lustre
done
for i in `seq -w 1 14`
do
echo "Copying config.xml to client scnode$i..."
rcp -p config.xml [EMAIL PROTECTED]:~/lustre
done
Andreas Dilger wrote:
On Feb 02, 2007 13:16 -0600, Kevin L. Buterbaugh wrote:
Sorry, meant to include that. Here's the relevant information from the
client (scnode01):
Feb 2 12:48:15 scnode01 kernel: LustreError:
16536:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x13/t0
o38->[EMAIL PROTECTED]@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19
Feb 2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The configuration
'client' could not be read from the MDS 'mds-test'. This may be the
result of communication errors between the client and the MDS, or if the
MDS is not running.
Client couldn't connect to the MDS. -19 = -ENODEV
And from the MDS (lustrem):
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170442057, 5s ago) [EMAIL PROTECTED] x1/t0
o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb 2 12:48:07 lustrem kernel: LustreError:
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170442082, 5s ago) [EMAIL PROTECTED] x4/t0
o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb 2 12:48:07 lustrem kernel: LustreError:
These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT).
What is in the OST syslog? Are you positive that /dev/sda1 and /dev/sdb1
on the two nodes are set up the same way, so that e.g. lustre1+sda1 isn't
talking to the same disk as lustre2+sdb1?
Also minor nit - you don't need to have a partition table, it can hurt
performance on some RAID setups because of the 512-byte offset of IOs
due to the DOS partition table.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
--
Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - [EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss