All,
Apologies in advance for the (long) newbie question, but I've Googled
this and the hits I've gotten haven't helped resolve it (seems other
people have had this same problem, but what was suggested to them I'm
already doing (I think)). If I've missed a URL where this is answered /
explained, please feel free to point me in that direction...
I'm trying to get Lustre 1.4.8 going on a test cluster. I installed the
software from the pre-packaged rpm's and rebooted. All my nodes show
"uname -a" output similar to the following:
Linux lustrem 2.6.9-42.0.3.EL_lustre.1.4.8smp #1 SMP Tue Dec 19 09:07:46
MST 2006 x86_64 x86_64 x86_64 GNU/Linux
My cluster consists of 5 dual-processor Opterons and 14 dual-processor
P4's (you can mix 32-bit / 64-bit as long as you install the right
rpm's, can't you?). One of the Opterons is my MDS (hostname: lustrem),
the other four are my OSD's (hostnames: lustre1 - 4). I have two
dual-controller FC storage arrays. Both controllers in the first
storage array are connected to two of my OSD's (lustre1 / 2) and the 2nd
storage array and lustre3 / 4 are connected identically. I have 2 RAID
5 LUNs defined on each of the storage arrays. lustre1 / 2 can both see
both of the RAID 5 LUNs as the following shows:
[EMAIL PROTECTED] ~]# fdisk -l
Disk /dev/hda: 41.1 GB, 41110142976 bytes
255 heads, 63 sectors/track, 4998 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 1 261 2096451 82 Linux swap
/dev/hda2 * 262 4998 38049952+ 83 Linux
Disk /dev/sda: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 152412 1224249358+ 83 Linux
Disk /dev/sdb: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 152412 1224249358+ 83 Linux
[EMAIL PROTECTED] ~]#
[EMAIL PROTECTED] ~]# fdisk -l
Disk /dev/hda: 41.1 GB, 41110142976 bytes
255 heads, 63 sectors/track, 4998 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 1 261 2096451 82 Linux swap
/dev/hda2 * 262 4998 38049952+ 83 Linux
Disk /dev/sda: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 152412 1224249358+ 83 Linux
Disk /dev/sdb: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 152412 1224249358+ 83 Linux
[EMAIL PROTECTED] ~]#
To try to simplify things, I'm trying to start out only using my MDS
(lustrem), 2 of my OSD's (lustre1 / 2), and one of my clients (scnode01).
Here's my config.sh:
#!/bin/sh
# config.sh
#
rm -f config.xml
#
# Create nodes
# Trying to get this to work with 1 MDS, 2 OST's, and 1 client. Will
add the
# others when I get this working. - klb, 2/2/07
#
lmc -m config.xml --add node --node lustrem
lmc -m config.xml --add node --node lustre1
lmc -m config.xml --add node --node lustre2
lmc -m config.xml --add node --node client
#
# Configure networking
#
lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp
lmc -m config.xml --add net --node client --nid '*' --nettype tcp
#
# Configure MDS
#
lmc -m config.xml --add mds --node lustrem --mds mds-test --fstype
ldiskfs --dev /tmp/mds-test --size 50000
#
# Configure OSTs - testing with 2 initially - klb, 2/1/2007
#
lmc -m config.xml --add lov --lov lov-test --mds mds-test --stripe_sz
1048576 --stripe_cnt 0 --stripe_pattern 0
lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost
ost1-test --fstype ldiskfs --dev /dev/sda1
lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost
ost2-test --fstype ldiskfs --dev /dev/sdb1
#
# Configure client (this is a 'generic' client used for all client mounts)
# testing with 1 client initially - klb, 2/1/2007
#
lmc -m config.xml --add mtpt --node client --path /mnt/lustre --mds
mds-test --lov lov-test
#
# Copy config.xml to all the other nodes in the cluster - klb, 2/1/07
#
for i in `seq 1 4`
do
echo "Copying config.xml to OST lustre$i..."
rcp -p config.xml [EMAIL PROTECTED]:~/lustre
done
for i in `seq -w 1 14`
do
echo "Copying config.xml to client scnode$i..."
rcp -p config.xml [EMAIL PROTECTED]:~/lustre
done
After running this script, I logged in to lustre1 and executed "lconf
--reformat --node lustre1 config.xml", which produces the following output:
loading module: libcfs srcdir None devdir libcfs
loading module: lnet srcdir None devdir lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
loading module: lvfs srcdir None devdir lvfs
loading module: obdclass srcdir None devdir obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
loading module: ost srcdir None devdir ost
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
loading module: obdfilter srcdir None devdir obdfilter
NETWORK: NET_lustre1_tcp NET_lustre1_tcp_UUID tcp lustre1
OSD: ost1-test ost1-test_UUID obdfilter /dev/sda1 0 ldiskfs no 0 256
And running "lconf --reformat --node lustre2 config.xml" on lustre2
produces the following output:
loading module: libcfs srcdir None devdir libcfs
loading module: lnet srcdir None devdir lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
loading module: lvfs srcdir None devdir lvfs
loading module: obdclass srcdir None devdir obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
loading module: ost srcdir None devdir ost
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
loading module: obdfilter srcdir None devdir obdfilter
NETWORK: NET_lustre2_tcp NET_lustre2_tcp_UUID tcp lustre2
OSD: ost2-test ost2-test_UUID obdfilter /dev/sdb1 0 ldiskfs no 0 256
Next, I logged in to lustrem and executed "lconf --reformat --node
lustrem config.xml" and see the following:
loading module: libcfs srcdir None devdir libcfs
loading module: lnet srcdir None devdir lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
loading module: lvfs srcdir None devdir lvfs
loading module: obdclass srcdir None devdir obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
loading module: mdc srcdir None devdir mdc
loading module: osc srcdir None devdir osc
loading module: lov srcdir None devdir lov
loading module: mds srcdir None devdir mds
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
NETWORK: NET_lustrem_tcp NET_lustrem_tcp_UUID tcp lustrem
MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs no
recording clients for filesystem: FS_fsname_UUID
Recording log mds-test on mds-test
LOV: lov_mds-test 950ad_lov_mds-test_189f832962 mds-test_UUID 0 1048576
0 0 [u'ost1-test_UUID', u'ost2-test_UUID'] mds-test
OSC: OSC_lustrem_ost1-test_mds-test 950ad_lov_mds-test_189f832962
ost1-test_UUID
OSC: OSC_lustrem_ost2-test_mds-test 950ad_lov_mds-test_189f832962
ost2-test_UUID
End recording log mds-test on mds-test
Recording log client on mds-test
MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs 50000 no
MDS mount options: errors=remount-ro
But when I log on to scnode01 and execute "mount -t lustre
lustrem:/mds-test/client /mnt/lustre", I get the following error:
mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed:
Input/output error
mds nid 0: [EMAIL PROTECTED]
mds name: mds-test
profile: client
options: rw
retry: 0
One other thing I've tried is instead of calling my client "client" in
config.sh and the mount command, I've used the actual hostname
(scnode01) instead. That didn't help.
Again, I apologize for both the length of this post and the newbie
question, but I can't seem to figure this out on my own and I've got a
deadline looming. Any and all help (and even flames, as long as you
answer my question or point me in the right direction!) is appreciated...
--
Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - [EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss