Hi,
Please look at the attached log to see what I mean. The behavior and
dmesg output is similar when using your sequence of commands.
We have removed the two Mellanox cards from the OSS and put a single
Voltaire card in it.
This seems to work. I could connect 60 nodes without errors. We will
further investigate this to better understand the cause of the original
problem.
Thanks,
Mirko
Eric Barton schrieb:
At several times some of the clients hang during fs mount or
when an OST
is added (see log).
Error:
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected())
[EMAIL PROTECTED]
rejected: reason 8, size 148
from OFED:
enum ib_cm_rej_reason {
IB_CM_REJ_INVALID_SERVICE_ID = 8,
Once an IPoIB ping is started to the corresponding OST the client
continues. Afterwards it is quite stable.
Any idea how this could be fixed?
Are you sure the OST is up and running before you attempt to mount it?
Do you get the same error if you do the following on the client before you
try to mount...
1. modprobe lnet
2. lctl net up
3. lctl ping [EMAIL PROTECTED]
Cheers,
Eric
[EMAIL PROTECTED] xiranet]# psh node107-node110 mount -t lustre [EMAIL
PROTECTED]:/testfs /mnt/testfs
node107: mount.lustre: mount [EMAIL PROTECTED]:/testfs at /mnt/testfs
failed: Input/output error
node107: Is the MGS running?
node109: mount.lustre: mount [EMAIL PROTECTED]:/testfs at /mnt/testfs
failed: Input/output error
node109: Is the MGS running?
[EMAIL PROTECTED] xiranet]# psh node107-node110 mount | grep testfs
node110: [EMAIL PROTECTED]:/testfs on /mnt/testfs type lustre (rw)
node108: [EMAIL PROTECTED]:/testfs on /mnt/testfs type lustre (rw)
[EMAIL PROTECTED] xiranet]# psh node107-node110 ping -c 3 10.0.90.9 | grep loss
node110: 3 packets transmitted, 3 received, 0% packet loss, time 1999ms
node107: 3 packets transmitted, 3 received, 0% packet loss, time 1999ms
node109: 3 packets transmitted, 3 received, 0% packet loss, time 1999ms
node108: 3 packets transmitted, 3 received, 0% packet loss, time 1999ms
[EMAIL PROTECTED] xiranet]# psh node107-node110 mount -t lustre [EMAIL
PROTECTED]:/testfs /mnt/testfs
node108: mount.lustre: according to /etc/mtab [EMAIL PROTECTED]:/testfs
is already mounted on /mnt/testfs
node110: mount.lustre: according to /etc/mtab [EMAIL PROTECTED]:/testfs
is already mounted on /mnt/testfs
[EMAIL PROTECTED] xiranet]# psh node107-node110 mount -t lustre [EMAIL
PROTECTED]:/testfs /mnt/testfs
node110: mount.lustre: according to /etc/mtab [EMAIL PROTECTED]:/testfs
is already mounted on /mnt/testfs
node107: mount.lustre: according to /etc/mtab [EMAIL PROTECTED]:/testfs
is already mounted on /mnt/testfs
node108: mount.lustre: according to /etc/mtab [EMAIL PROTECTED]:/testfs
is already mounted on /mnt/testfs
node109: mount.lustre: according to /etc/mtab [EMAIL PROTECTED]:/testfs
is already mounted on /mnt/testfs
[EMAIL PROTECTED] xiranet]#
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss