Re: [Lustre-discuss] Problem with write_conf

Roger Spellman Tue, 03 Aug 2010 14:01:44 -0700

Nathan,


I started out with IP addresses of 10.2.9.1 (MDS), 10.2.9.2 (standby
MDS), 10.2.9.3 (OSS), and 10.2.9.4 (peer OSS).  I created a single MDT
and a single OST, using the following commands:

 

MDS#  mkfs.lustre --reformat --fsname hss2 --device-size=10000 --mgs
--mdt --mkfsoptions=' -O extents,dir_index,uninit_groups'
--mgsnode=10.2....@o2ib0 /dev/mapper/map0

OSS#  mkfs.lustre --reformat --ost --index=0 --mkfsoptions=' -O
extents,dir_index,uninit_groups ' --fsname hss2 --device-size=100000
--mgsnode=10.2....@o2ib0 /dev/mapper/map0

 

I mounted, mounted a client, created a few files, then unmounted the
client, unmounted the servers, rebooted the clients and servers.

 

Once the servers were back up, I ran the following on the MDS and OSS,
respectively:

 

 

MDS#  tunefs.lustre --erase-param --mgsnode=10.2.9....@o2ib0
--failnode=10.2.9....@o2ib0 /dev/mapper/map0 

OSS#  tunefs.lustre --erase-param --failnode=10.2.9....@o2ib0
--mgsnode=10.2.9....@o2ib0 --mgsnode=10.2.9....@o2ib0 /dev/mapper/map0

 

Then, I removed last_rcvd from the MDT and OST.

 

The, I changed the IP address to 10.2.9.201 (MDS), 10.2.9.202 (standby
MDS), 10.2.9.203 (OSS), 10.2.9.204 (peer OSS).

 

I mounted the MDT and OST.  After a short while, I got the following
errors on the MDS:

 

Lustre: 4567:0:(client.c:1464:ptlrpc_expire_one_request()) @@@ Request
x1343087831941136 sent from hss2-OST0000-osc to NID 10.2.9....@o2ib 0s
ago has failed due to network 

error (5s prior to deadline).

  r...@ffff810213b5e400 x1343087831941136/t0
o8->[email protected]@o2ib:28/4 lens 368/584 e 0 to 1 dl
1280868405 ref 1 fl Rpc:N/0/0 rc 0/0

Lustre: 4568:0:(import.c:517:import_select_connection())
hss2-OST0000-osc: tried all connections, increasing latency to 1s

Lustre: 4567:0:(client.c:1464:ptlrpc_expire_one_request()) @@@ Request
x1343087831941137 sent from hss2-OST0000-osc to NID 10.2....@o2ib 6s ago
has timed out (6s prior to d

eadline).

  r...@ffff810213b5e400 x1343087831941137/t0
o8->[email protected]@o2ib:28/4 lens 368/584 e 0 to 1 dl
1280868412 ref 2 fl Rpc:N/0/0 rc 0/0

LustreError: 4567:0:(lib-move.c:2441:LNetPut()) Error sending PUT to
12345-10.2.9....@o2ib: -113

 

Note that the old IP address of the old OST (10.2.9.203) is still
listed.  How can I change that?

 

The client is also seeing old IP addresses, this time the MDS's
10.2.9.1:

 

Lustre: Request x55 sent from hss2-MDT0000-mdc-ffff81007981d800 to NID
10.2....@o2ib 5s ago has timed out (limit 5s).

Lustre: Skipped 9 previous similar messages

Lustre: 6433:0:(import.c:507:import_select_connection())
hss2-MDT0000-mdc-ffff81007981d800: tried all connections, increasing
latency to 50s

Lustre: 6433:0:(import.c:507:import_select_connection()) Skipped 4
previous similar messages

 

Any help is appreciated.

 

Thanks.

 

-Roger

 

 

 

________________________________

From: Roger Spellman 
Sent: Tuesday, August 03, 2010 4:22 PM
To: 'Nathan Rutman'
Cc: [email protected]
Subject: RE: [Lustre-discuss] Problem with write_conf

 

Nathan,

 

Thanks.  That works great.

 

Are there any tricks involved in also making a non-redundant system
redundant at the same time?  E.g. Can I just do:

 

 

MDS#  tunefs.lustre --erase-param --mgsnode=10.2.9....@o2ib0
--failnode=10.2.9....@o2ib0 /dev/mapper/map0 

OSS#  tunefs.lustre --erase-param --failnode=10.2.9....@o2ib0
--mgsnode=10.2.9....@o2ib0 --mgsnode=10.2.9....@o2ib0 /dev/mapper/map0

 

Is the OSS's NID stored anywhere on the OST?

 

-Roger

 

________________________________

From: Nathan Rutman [mailto:[email protected]] 
Sent: Tuesday, August 03, 2010 4:05 PM
To: Roger Spellman
Cc: [email protected]
Subject: Re: [Lustre-discuss] Problem with write_conf

 

 

On Aug 3, 2010, at 12:49 PM, Roger Spellman wrote:

 

If I change the NIDs, and if I don't remove /mnt/mdt/CONFIGS/*-client,
then I get the following when I try mounting a client (note that
10.2.9.1 is the OLD address):

 

mount.lustre: mount 10.2....@o2ib:/hss2 at /mnt/lustre-hss2 failed:
Cannot send after transport endpoint shutdown

 

Don't mount with the old address :)

This is not contained in the config log; this is the MGS address the
client needs to talk to to GET the config log.  It needs to point to the
current IP of the MGS.  Maybe you've stuck this in /etc/fstab or perhaps
your DNS name resolution of the MGS's common name hasn't been updated.

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Problem with write_conf

Reply via email to