On Sep 05, 2008  11:11 -0400, Aaron Knister wrote:
> Does the new MDS actually have an MGS running? FYI- you only need one  
> mgs per lustre set up. In the commands you issued it doesn't look like  
> you actually set up an MGS on the host "mds2". Can you run an "lctl  
> dl" on mds2 and send the output?

There are tradeoffs between having a single MGS for multiple filesystems,
and having one MGS per filesystem (assuming different MDS nodes).  In
general, there isn't much benefit to sharing an MGS between multiple MDS
nodes, and the drawback is that it is a single point of failure, so you
may as well have one per MDS.

> On Sep 4, 2008, at 4:54 PM, Ms. Megan Larko wrote:
> 
> > Hi,
> >
> > I have a new MGS/MDS that I would like to start.   It is another of
> > the same Cent0S 5 kernel 2.6.18-53.1.13.el5
> > lustre-1.6.4.3smp as my other boxes.  Initially I had an IP number
> > that was used elsewhere in our group.  I
> > changed it using the tunefs.lustre command below for the new MDT.
> >
> > [EMAIL PROTECTED] ~]# tunefs.lustre --erase-params --writeconf
> > [EMAIL PROTECTED] /dev/sdd1
> > checking for existing Lustre data: found CONFIGS/mountdata
> > Reading CONFIGS/mountdata
> >
> >   Read previous values:
> > Target:     crew8-MDTffff
> > Index:      unassigned
> > Lustre FS:  crew8
> > Mount type: ldiskfs
> > Flags:      0x71
> >              (MDT needs_index first_time update )
> > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> > Parameters: [EMAIL PROTECTED]
> >
> >
> >   Permanent disk data:
> > Target:     crew8-MDTffff
> > Index:      unassigned
> > Lustre FS:  crew8
> > Mount type: ldiskfs
> > Flags:      0x171
> >              (MDT needs_index first_time update writeconf )
> > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> > Parameters: [EMAIL PROTECTED]
> >
> > Writing CONFIGS/mountdata
> >
> > Next I try to mount this new MDT onto the system....
> > [EMAIL PROTECTED] ~]# mount -t lustre /dev/sdd1 
> > /srv/lustre/mds/crew8-MDT0000
> > mount.lustre: mount /dev/sdd1 at /srv/lustre/mds/crew8-MDT0000 failed:
> > Input/output error
> > Is the MGS running?
> >
> > Ummm---  yeah, I thought the MGS is running.
> >
> > [EMAIL PROTECTED] ~]# tail /var/log/messages
> > Sep  4 16:28:08 mds2 kernel: LDISKFS-fs: mounted filesystem with
> > ordered data mode.
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3526:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at
> > 1220560088, 5s ago)  [EMAIL PROTECTED] x3/t0
> > o250->[EMAIL PROTECTED]@o2ib_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc
> > 0/-22
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3797:0:(obd_mount.c:954:server_register_target()) registration with
> > the MGS failed (-5)
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3797:0:(obd_mount.c:1054:server_start_targets()) Required registration
> > failed for crew8-MDTffff: -5
> > Sep  4 16:28:13 mds2 kernel: LustreError: 15f-b: Communication error
> > with the MGS.  Is the MGS running?
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3797:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets:
> > -5
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3797:0:(obd_mount.c:1368:server_put_super()) no obd crew8-MDTffff
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3797:0:(obd_mount.c:119:server_deregister_mount()) crew8-MDTffff not
> > registered
> > Sep  4 16:28:13 mds2 kernel: Lustre: server umount crew8-MDTffff  
> > complete
> > Sep  4 16:28:13 mds2 kernel: LustreError:
> > 3797:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount  (-5)
> >
> > The o2ib network is up.   It is ping-able via bash and lctl.   I can
> > get to it from itself and from other computers on
> > this local subnet.
> >
> > [EMAIL PROTECTED] ~]# lctl
> > lctl > ping [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > lctl > ping [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > lctl > quit
> >
> > On this net, there are no firewalls as the computers are using only
> > non-routable IP numbers.  So there is not a
> > firewall issue of which I am aware...
> > [EMAIL PROTECTED] ~]# iptables -L
> > -bash: iptables: command not found
> >
> > The only oddity I have found is that the modules in my working MGS/MDS
> > are used more than the modules in my
> > new MGS/MDT.
> >
> > Correctly functioning MGS/MDT:
> > [EMAIL PROTECTED] ~]# lsmod | grep mgs
> > mgs                   181512  1
> > mgc                    86744  2 mgs
> > ptlrpc                659512  8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc
> > obdclass              542200  13
> > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc
> > lvfs                   84712  12
> > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass
> > libcfs                183128  14
> > osc 
> > ,mds 
> > ,fsfilt_ldiskfs 
> > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs
> > [EMAIL PROTECTED] ~]# lsmod | grep osc
> > osc                   172136  11
> > ptlrpc                659512  8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc
> > obdclass              542200  13
> > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc
> > lvfs                   84712  12
> > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass
> > libcfs                183128  14
> > osc 
> > ,mds 
> > ,fsfilt_ldiskfs 
> > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs
> > [EMAIL PROTECTED] ~]# lsmod | grep lnet
> > lnet                  255656  4 lustre,ko2iblnd,ptlrpc,obdclass
> > libcfs                183128  14
> > osc 
> > ,mds 
> > ,fsfilt_ldiskfs 
> > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs
> >
> > Failing MGS/MDT:
> > [EMAIL PROTECTED] ~]# lsmod | grep mgs
> > mgs                   181512  0
> > mgc                    86744  1 mgs
> > ptlrpc                659512  8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc
> > obdclass              542200  10
> > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc
> > lvfs                   84712  12
> > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass
> > libcfs                183128  14
> > osc 
> > ,lustre 
> > ,lov 
> > ,mdc 
> > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs
> > [EMAIL PROTECTED] ~]# lsmod | grep osc
> > osc                   172136  0
> > ptlrpc                659512  8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc
> > obdclass              542200  10
> > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc
> > lvfs                   84712  12
> > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass
> > libcfs                183128  14
> > osc 
> > ,lustre 
> > ,lov 
> > ,mdc 
> > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs
> > [EMAIL PROTECTED] ~]# lsmod | grep lnet
> > lnet                  255656  4 lustre,ko2iblnd,ptlrpc,obdclass
> > libcfs                183128  14
> > osc 
> > ,lustre 
> > ,lov 
> > ,mdc 
> > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs
> >
> > The failing MGS/MDT has a 0 by mgs and not a 1 like the working MGS/ 
> > MDT.
> > The osc module has 11 by it in the working version and 0 by it in the
> > non-working version.
> > The lnet is the same as are most of the other module comparisons.  Am
> > I missing something at the module mgs/mgc/osc
> > level?  Or are those modules just indicating that they are actually
> > in-use on my good MGS/MDT?
> >
> > Even with IB cabling aside (I'm working on the MGS/MDS itself), why
> > can I not mount a new MDT?  Why do I see the message:
> > Is the MGS running?  I am actually on the MGS/MDS itself.
> >
> > Also I receive the same result if I attempt to mount an OST on an OSS
> > which is pointing to this new MGS/MDT.  The OST won't
> > even mount locally on the OSS without successful communication with
> > its associated MGS/MDT.
> >
> > Any and all suggestions gratefully appreciated.
> >
> > megan
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to