Hello,

 

#I'm sorry if this is a double post, but the message bounced back to me, and
I'm not sure whether it went or not.

 

I have spent the last couple of months designing the file system for a
scientific cluster that I am helping administer.  After a bunch of testing,
we are finally ready to actually get down to using our setup.  We have a
slightly strange setup, so I'll start off with explaining what we are doing.

 

We rebuilt the kernel from the
kernel-source-2.6.9-42.0.2.EL_lustre.1.4.7.1.x86_64.rpm package in order to
to slim down the features and to add root over nfs support to it.  Then we
built lustre-1.4.8 against that kernel.  We have one head-node that is the
MDS as well as the boot server (it exports the root file system and runs
tftpd).  All of the other nodes, then, boot off of that server.  The other
(slave) nodes are the OSS's.  I use this script to generate the config file:

 

#############################

#############################

cms-lustre-config.sh

#############################

#############################

 

#!/bin/bash

 

rm cluster-production.xml

 

#-----------------

#Create the nodes |

#-----------------

 

lmc -m cluster-production.xml --add node --node osg1

lmc -m cluster-production.xml --add net --node osg1 --nid [EMAIL PROTECTED]
--nettype lnet

 

lmc -m cluster-production.xml --add node --node node253 

lmc -m cluster-production.xml --add net --node node253 --nid [EMAIL PROTECTED]
--nettype lnet

 

lmc -m cluster-production.xml --add node --node node252 

lmc -m cluster-production.xml --add net --node node252 --nid [EMAIL PROTECTED]
--nettype lnet

 

lmc -m cluster-production.xml --add node --node node251 

lmc -m cluster-production.xml --add net --node node251 --nid [EMAIL PROTECTED]
--nettype lnet

 

lmc -m cluster-production.xml --add node --node node250 

lmc -m cluster-production.xml --add net --node node250 --nid [EMAIL PROTECTED]
--nettype lnet

 

lmc -m cluster-production.xml --add node --node node249 

lmc -m cluster-production.xml --add net --node node249 --nid [EMAIL PROTECTED]
--nettype lnet

 

lmc -m cluster-production.xml --add node --node client

lmc -m cluster-production.xml --add net --node client --nid '*' --nettype
lnet

 

#--------------

#Configure MDS |

#--------------

 

lmc -m cluster-production.xml --add mds --node osg1 --mds cms-mds --fstype
ldiskfs --dev /dev/sdb 

 

#---------------

#Configure OSTs |

#---------------

 

lmc -m cluster-production.xml --add lov --lov cms-lov --mds cms-mds
--stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0

 

#Head Node

#==========

#lmc -m cluster-production.xml --add ost --node osg1 --lov cms-lov --ost
node001-ost --fstype ldiskfs --dev /dev/sdc

#==========

 

#Compute Nodes

#==========

 

#node253

lmc -m cluster-production.xml --add ost --node node253 --lov cms-lov --ost
node253-ost-sda --fstype ldiskfs --dev /dev/sda

 

lmc -m cluster-production.xml --add ost --node node253 --lov cms-lov --ost
node253-ost-sdb --fstype ldiskfs --dev /dev/sdb

 

lmc -m cluster-production.xml --add ost --node node253 --lov cms-lov --ost
node253-ost-sdc --fstype ldiskfs --dev /dev/sdc

 

lmc -m cluster-production.xml --add ost --node node253 --lov cms-lov --ost
node253-ost-sdd --fstype ldiskfs --dev /dev/sdd

 

#--------

 

#node252

lmc -m cluster-production.xml --add ost --node node252 --lov cms-lov --ost
node252-ost-sda --fstype ldiskfs --dev /dev/sda

 

lmc -m cluster-production.xml --add ost --node node252 --lov cms-lov --ost
node252-ost-sdb --fstype ldiskfs --dev /dev/sdb

 

lmc -m cluster-production.xml --add ost --node node252 --lov cms-lov --ost
node252-ost-sdc --fstype ldiskfs --dev /dev/sdc

 

lmc -m cluster-production.xml --add ost --node node252 --lov cms-lov --ost
node252-ost-sdd --fstype ldiskfs --dev /dev/sdd

 

#---------

 

#node251

lmc -m cluster-production.xml --add ost --node node251 --lov cms-lov --ost
node251-ost-sda --fstype ldiskfs --dev /dev/sda

 

lmc -m cluster-production.xml --add ost --node node251 --lov cms-lov --ost
node251-ost-sdb --fstype ldiskfs --dev /dev/sdb

 

lmc -m cluster-production.xml --add ost --node node251 --lov cms-lov --ost
node251-ost-sdc --fstype ldiskfs --dev /dev/sdc

 

lmc -m cluster-production.xml --add ost --node node251 --lov cms-lov --ost
node251-ost-sdd --fstype ldiskfs --dev /dev/sdd

 

#---------

 

#node250

lmc -m cluster-production.xml --add ost --node node250 --lov cms-lov --ost
node250-ost-sda --fstype ldiskfs --dev /dev/sda

 

lmc -m cluster-production.xml --add ost --node node250 --lov cms-lov --ost
node250-ost-sdb --fstype ldiskfs --dev /dev/sdb

 

lmc -m cluster-production.xml --add ost --node node250 --lov cms-lov --ost
node250-ost-sdc --fstype ldiskfs --dev /dev/sdc

 

lmc -m cluster-production.xml --add ost --node node250 --lov cms-lov --ost
node250-ost-sdd --fstype ldiskfs --dev /dev/sdd

 

#---------

 

#node249

lmc -m cluster-production.xml --add ost --node node249 --lov cms-lov --ost
node249-ost-sda --fstype ldiskfs --dev /dev/sda

 

lmc -m cluster-production.xml --add ost --node node249 --lov cms-lov --ost
node249-ost-sdb --fstype ldiskfs --dev /dev/sdb

 

lmc -m cluster-production.xml --add ost --node node249 --lov cms-lov --ost
node249-ost-sdc --fstype ldiskfs --dev /dev/sdc

 

lmc -m cluster-production.xml --add ost --node node249 --lov cms-lov --ost
node249-ost-sdd --fstype ldiskfs --dev /dev/sdd

 

#==========

 

 

 

#-----------------

#Configure client |

#-----------------

 

lmc -m cluster-production.xml --add mtpt --node client --path
/mnt/cms-lustre --mds cms-mds --lov cms-lov

 

cp cluster-production.xml
/cluster-images/rootfs-SL4-x86_64/root/lustre-config/

 

 

#############################

#############################

end

#############################

#############################

 

 

Now, I do all the 

lconf --reformat --node <insert node name> cluster-production.xml

on each node, and wait for a while for everything to format.  Everything
completes fine, without an error.

 

I then do mount.lustre 10.0.0.243:/cms-mds/client /mnt/cms-lustre on the
head node (osg1).

 

That also works fine.  I've run a number of tests, and it works fine (really
well, in fact).

 

The problem occurs when I attempt to mount the file system on the slave
nodes.  When I do the same command as above, I get the following:

 

[EMAIL PROTECTED] ~]# mount.lustre 10.0.0.243:/cms-mds/client
/var/writable/cms-lustre/

mount.lustre: mount([EMAIL PROTECTED]:/cms-mds/client,
/var/writable/cms-lustre/) failed: No such device

mds nid 0:       [EMAIL PROTECTED]

mds name:        cms-mds

profile:         client

options:

retry:           0

Are the lustre modules loaded?

Check /etc/modprobe.conf and /proc/filesystems

[EMAIL PROTECTED] ~]#

 

and this pops up in the error log:

 

Feb 14 04:50:07 localhost kernel: LustreError:
6053:0:(genops.c:224:class_newdev()) OBD: unknown type: osc

Feb 14 04:50:07 localhost kernel: LustreError:
6053:0:(obd_config.c:102:class_attach()) Cannot create device
OSC_osg1.<mydomain>_node253-ost-sda_MNT_client-000001011f659c00 of type osc
: -19

Feb 14 04:50:07 localhost kernel: LustreError: mdc_dev: The configuration
'client' could not be read from the MDS 'cms-mds'.  This may be the result
of communication errors between the client and the MDS, or if the MDS is not
running.

Feb 14 04:50:07 localhost kernel: LustreError:
6053:0:(llite_lib.c:936:lustre_fill_super()) Unable to process log: client

 

Any idea what's going on here?

 

Thanks a bunch for the help.

 

-Matt

 

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to