Some additional information on the problem. I tried disconnecting the
ethernet connection to
the server machine (192.168.1.94) and tried running a disk test on the
client (192.168.1.156 via ethernet), writing to
what I thought was the IB mounted file system (mount -t lustre
[EMAIL PROTECTED]:/lusty /slut, where 192.168.3.50 is
the ip of ib0 on the client machine).
Looking at /var/log/syslog I saw:
Jun 3 08:18:59 nasnu3 kernel: Lustre:
3625:0:(router.c:167:lnet_notify()) Ignoring prediction from
[EMAIL PROTECTED] of [EMAIL PROTECTED] down 23987983711 seconds in the future
Jun 3 08:19:19 nasnu3 kernel: Lustre:
3623:0:(router.c:167:lnet_notify()) Ignoring prediction from
[EMAIL PROTECTED] of [EMAIL PROTECTED] down 23987983691 seconds in the future
Jun 3 08:19:44 nasnu3 kernel: Lustre:
3622:0:(router.c:167:lnet_notify()) Ignoring prediction from
[EMAIL PROTECTED] of [EMAIL PROTECTED] down 23987983666 seconds in the future
Jun 3 08:20:06 nasnu3 kernel: LustreError:
3660:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at
1212495506, 100s ago) [EMAIL PROTECTED] x601951/t0
o400->[EMAIL PROTECTED]@tcp:12 lens 128/128 ref 1 fl
Rpc:N/0/0 rc 0/-22
Jun 3 08:20:06 nasnu3 kernel: LustreError:
3660:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 1 previous
similar message
Jun 3 08:20:06 nasnu3 kernel: Lustre:
lusty-MDT0000-mdc-ffff810226207000: Connection to service lusty-MDT0000
via nid [EMAIL PROTECTED] was lost; in progress operations using this
service will wait for recovery to complete.
Jun 3 08:20:06 nasnu3 kernel: Lustre: Skipped 1 previous similar message
Jun 3 08:20:09 nasnu3 kernel: Lustre:
3624:0:(router.c:167:lnet_notify()) Ignoring prediction from
[EMAIL PROTECTED] of [EMAIL PROTECTED] down 23987983641 seconds in the future
Jun 3 08:20:09 nasnu3 kernel: Lustre:
3624:0:(router.c:167:lnet_notify()) Skipped 1 previous similar message
Jun 3 08:21:49 nasnu3 kernel: Lustre:
3623:0:(router.c:167:lnet_notify()) Ignoring prediction from
[EMAIL PROTECTED] of [EMAIL PROTECTED] down 23987983541 seconds in the future
Jun 3 08:22:53 nasnu3 kernel: LustreError:
7051:0:(mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -4
Jun 3 08:22:53 nasnu3 kernel: LustreError:
7051:0:(file.c:2512:ll_inode_revalidate_fini()) failure -4 inode 10407505
Jun 3 08:23:26 nasnu3 kernel: LustreError:
3661:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at
1212495706, 100s ago) [EMAIL PROTECTED] x601971/t0
o38->[EMAIL PROTECTED]@tcp:12 lens 304/328 ref 1 fl
Rpc:/0/0 rc 0/-22
Jun 3 08:23:26 nasnu3 kernel: LustreError:
3661:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 10 previous
similar messages
Jun 3 08:23:26 nasnu3 kernel: LustreError: 167-0: This client was
evicted by lusty-MDT0000; in progress operations using this service will
fail.
Jun 3 08:23:26 nasnu3 kernel: LustreError: Skipped 1 previous similar
message
Jun 3 08:23:26 nasnu3 kernel: LustreError:
7052:0:(mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -5
Jun 3 08:23:26 nasnu3 kernel: LustreError:
7052:0:(mdc_locks.c:423:mdc_finish_enqueue()) Skipped 1 previous similar
message
Jun 3 08:23:26 nasnu3 kernel: Lustre:
lusty-MDT0000-mdc-ffff810226207000: Connection restored to service
lusty-MDT0000 using nid [EMAIL PROTECTED]
Jun 3 08:23:26 nasnu3 kernel: Lustre: Skipped 1 previous similar message
Jun 3 08:23:26 nasnu3 kernel: LustreError:
7052:0:(file.c:2512:ll_inode_revalidate_fini()) failure -5 inode 10407505
Thanks,
murray
Isaac Huang wrote:
On Mon, Jun 02, 2008 at 01:40:20PM -0400, Murray Smigel wrote:
Hi,
I have built a simple lustre setup. MDS and OSS are both on a Centos5
machine using the red hat lustre modified kernel
2.6.18-8.1.14.el5_lustre.1.6.4.1 running OFED-1.3. Lustre is 1.6.4.3.
The client machine is Debian running the same kernel and OFED-1.3 and
lustre 1.6.4.3.
The MDT and OST are both single partitions on the the same disk (yes,
I know this is not optimal...)
The network uses Mellanox ConnectX HCAs running through a Voltaire
ISR2004 switch.
What is you Lustre network configurations (i.e. lnet options)? If not
sure, what's the output of 'lctl list_nids' on the client and the
server?
Isaac
The basic RDMA setup seems to work in either direction:
[EMAIL PROTECTED]:/slut$ ib_rdma_bw 192.168.3.50 (Lustre server)
5605: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 |
iters=1000 | duplex=0 | cma=0 |
5605: Local address: LID 0x07, QPN 0x22004e, PSN 0x323aa6 RKey
0x1a002800 VAddr 0x002aaaaaad6000
5605: Remote address: LID 0x05, QPN 0x8004f, PSN 0x67c28c, RKey
0x8002800 VAddr 0x002aaaab705000
5605: Bandwidth peak (#0 to #985): 1332.53 MB/sec
5605: Bandwidth average: 1332.47 MB/sec
5605: Service Demand peak (#0 to #985): 1462 cycles/KB
5605: Service Demand Avg : 1462 cycles/KB
[EMAIL PROTECTED] bin]$ ib_rdma_bw 192.168.3.30 (Lustre client)
3845: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 |
iters=1000 | duplex=0 | cma=0 |
3845: Local address: LID 0x05, QPN 0xa004f, PSN 0x4f4712 RKey
0xa002800 VAddr 0x002aaaab705000
3845: Remote address: LID 0x07, QPN 0x24004e, PSN 0xa740c1, RKey
0x1c002800 VAddr 0x002aaaaaad6000
3845: Bandwidth peak (#0 to #956): 1533.5 MB/sec
3845: Bandwidth average: 1533.43 MB/sec
3845: Service Demand peak (#0 to #956): 1146 cycles/KB
3845: Service Demand Avg : 1146 cycles/KB
Local disk speed on the Lustre server seems fine, as does speed when
the Lustre machine writes
to the Lustre mounted drive (50-80 MB/s).
[EMAIL PROTECTED] slut]$ dd if=/dev/zero of=foo bs=1048576 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 13.5875 seconds, 79.0 MB/s
Performance of the client machine writing to the Lustre drive is poor
(12 MB/s)
[EMAIL PROTECTED]:/slut$ mount -t lustre -l
[EMAIL PROTECTED]:/lusty on /slut type lustre (rw)
[EMAIL PROTECTED]:/slut$ dd if=/dev/zero of=foo bs=1048576 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 88.9857 seconds, 12.1 MB/s
Similar results from using Bonnie++ for the testing.
Any ideas as to what might be going on?
Thanks,
murray
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss