I don't know if the voltaire IB stack is the same as OFED but I'm
guessing it has a subnet manager. Check that. I've had similar issues
when my subnet manager has crashed.
On Jan 9, 2008, at 3:08 AM, Changer Van wrote:
Network connection is down. I can not ping the other nodes.
I ran the vstat command and found one of the port_state is
'port_initialize'.
What does 'port_initialize' mean? Dose it mean my ib card is broken?
1 HCA found:
hca_id=InfiniHost_III_Ex0
pci_location={BUS=0x20,DEV/FUNC=0x00}
vendor_id=0x02C9
vendor_part_id=0x6282
hw_ver=0xA0
fw_ver=5.1.400
PSID=MT_0140000001
num_phys_ports=2
port=1
port_state=PORT_INITIALIZE
sm_lid=0x0000
port_lid=0x0000
port_lmc=0x00
max_mtu=2048
port=2
port_state=PORT_DOWN
sm_lid=0x0000
port_lid=0x0000
port_lmc=0x00
max_mtu=2048
--
Regards,
Changer
On Jan 9, 2008 3:27 AM, Klaus Steden <[EMAIL PROTECTED]> wrote:
If you're using IPoIB, you can use standard TCP/IP diagnostic tools
the same way you would on an Ethernet link (ifconfig, ping,
traceroute, telnet, etc.)
If you're using a copper-to-optical converter in your data path as
well, the Emcore MIAs have link lights on them which will tell you
if a physical link is present (check the documentation). I know with
STP InfiniBand connectors, there is some ambiguity about terminology
with some vendors and manufacturers, and the fibre arrangement
doesn't provide a lot of wiggle room.
Klaus
On 1/7/08 7:56 PM, "Changer Van" <[EMAIL PROTECTED]>did etch on
stone tablets:
On Jan 8, 2008 1:35 AM, Isaac Huang <[EMAIL PROTECTED]> wrote:
On Mon, Jan 07, 2008 at 06:20:52PM +0800, Changer Van wrote:
> ......
> # dmesg
>
> LustreError: 4273:0:(viblnd.c :1890:kibnal_startup())
>
> Can't find an active port on InfiniHost_III_Ex0
It meant that viblnd couldn't find a port whose link state was active
on the hca InfiniHost_III_Ex0, i.e . no link on the device was usable.
Was there any other error messages from viblnd before this one?
There was no error messages but a related message
like 'ADDRCONF(NETDEV_UP):ipoib0: link is not ready'.
Did you see this problem on just one node?
There are four nodes which can not mount the lustre system.
The other nodes can mount the lustre but got the following error
messages:
# dmesg
divert: not allocating divert_blk for non-ethernet device ipoib0
ERROR : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
ip_route_output_key(127.0.0.1 <http://127.0.0.1> ) failed
new: ipoib_allow_arp_joins: 1
ERROR : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed
ERROR : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed
ERROR : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed
How can I check the link on the device? Thanks in advance.
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies
(301) 595-7000
[EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss