On Tue, Jan 11, 2011 at 3:35 PM, Wojciech Turek <[email protected]> wrote: > Hi Brendon, > > Can you please provide following: > 1) output of ifconfig run on each OSS MDS and at least one client > 2) output of lctl list_nids run on each OSS MDS and at least one client > 3) output of tunefs.lustre --print --dryrun /dev/<OST_block_device> from > each OSS > > Wojciech
After someone looked at the emails I sent out, they grabbed me on IRC. We had a discussion and basically they interpreted the email as everything should be working, I just needed to wait for a repair to run and complete. What I then learned is that first, a client has to connect for a repair to initiate. Secondly, the code isn't perfect. The MDS kernel oops'ed twice before it finally completed a repair successfully. I was in the process of disabling panic on oops, but it finally completed successfully. Once that was done, I got a clean bill of health. Just to complete this discussion, I have listed the requested output. I might still learn something :) ...Looks like I did learn something. OSS0 has an issue with the root FS and was remounted RO which I discovered when running tunefs.lustre --print --dryrun /dev/sda5. The fun never ends :) -Brendon 1) ifconfig info MDS: # ifconfig eth0 Link encap:Ethernet HWaddr 00:15:17:5E:46:64 inet addr:10.1.1.1 Bcast:10.1.1.255 Mask:255.255.255.0 inet6 addr: fe80::215:17ff:fe5e:4664/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49140546 errors:0 dropped:0 overruns:0 frame:0 TX packets:63644404 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:18963170801 (17.6 GiB) TX bytes:65261762295 (60.7 GiB) Base address:0xcc00 Memory:f58e0000-f5900000 eth1 Link encap:Ethernet HWaddr 00:15:17:5E:46:65 inet addr:192.168.0.181 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::215:17ff:fe5e:4665/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:236738842 errors:0 dropped:0 overruns:0 frame:0 TX packets:458503163 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:15562858193 (14.4 GiB) TX bytes:686167422947 (639.0 GiB) Base address:0xc880 Memory:f5880000-f58a0000 OSS : # ifconfig eth0 Link encap:Ethernet HWaddr 00:1D:60:E0:5B:B2 inet addr:10.1.1.2 Bcast:10.1.1.255 Mask:255.255.255.0 inet6 addr: fe80::21d:60ff:fee0:5bb2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3092588 errors:0 dropped:0 overruns:0 frame:0 TX packets:3547204 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1320521551 (1.2 GiB) TX bytes:2670089148 (2.4 GiB) Interrupt:233 client: # ifconfig eth0 Link encap:Ethernet HWaddr 00:1E:8C:39:E4:69 inet addr:10.1.1.5 Bcast:10.1.1.255 Mask:255.255.255.0 inet6 addr: fe80::21e:8cff:fe39:e469/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:727922 errors:0 dropped:0 overruns:0 frame:0 TX packets:884188 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:433349006 (413.2 MiB) TX bytes:231985578 (221.2 MiB) Interrupt:50 2) lctl list_nids client: lctl list_nids 10.1.1.5@tcp MDS: lctl list_nids 10.1.1.1@tcp OSS: lctl list_nids 10.1.1.2@tcp 3) tunefs.lustre --print --dryrun /dev/sda5 OSS0: ]# tunefs.lustre --print --dryrun /dev/sda5 checking for existing Lustre data: found CONFIGS/mountdata tunefs.lustre: Can't create temporary directory /tmp/dirCZXt3k: Read-only file system tunefs.lustre FATAL: Failed to read previous Lustre data from /dev/sda5 (30) tunefs.lustre: exiting with 30 (Read-only file system) OSS1: # tunefs.lustre --print --dryrun /dev/sda5 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: mylustre-OST0001 Index: 1 Lustre FS: mylustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.1.1.1@tcp Permanent disk data: Target: mylustre-OST0001 Index: 1 Lustre FS: mylustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.1.1.1@tcp exiting before disk write. OSS2: # tunefs.lustre --print --dryrun /dev/sda5 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: mylustre-OST0002 Index: 2 Lustre FS: mylustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.1.1.1@tcp Permanent disk data: Target: mylustre-OST0002 Index: 2 Lustre FS: mylustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.1.1.1@tcp exiting before disk write. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
