I don't know if the voltaire IB stack is the same as OFED but I'm guessing it has a subnet manager. Check that. I've had similar issues when my subnet manager has crashed.

On Jan 9, 2008, at 3:08 AM, Changer Van wrote:

Network connection is down. I can not ping the other nodes.
I ran the vstat command and found one of the port_state is 'port_initialize'.
What does 'port_initialize' mean? Dose it mean my ib card is broken?

1 HCA found:
        hca_id=InfiniHost_III_Ex0
        pci_location={BUS=0x20,DEV/FUNC=0x00}
        vendor_id=0x02C9
        vendor_part_id=0x6282
        hw_ver=0xA0
        fw_ver=5.1.400
        PSID=MT_0140000001
        num_phys_ports=2
                port=1
                port_state=PORT_INITIALIZE
                sm_lid=0x0000
                port_lid=0x0000
                port_lmc=0x00
                max_mtu=2048
                port=2
                port_state=PORT_DOWN
                sm_lid=0x0000
                port_lid=0x0000
                port_lmc=0x00
                max_mtu=2048
--
Regards,
Changer

On Jan 9, 2008 3:27 AM, Klaus Steden <[EMAIL PROTECTED]> wrote:

If you're using IPoIB, you can use standard TCP/IP diagnostic tools the same way you would on an Ethernet link (ifconfig, ping, traceroute, telnet, etc.)

If you're using a copper-to-optical converter in your data path as well, the Emcore MIAs have link lights on them which will tell you if a physical link is present (check the documentation). I know with STP InfiniBand connectors, there is some ambiguity about terminology with some vendors and manufacturers, and the fibre arrangement doesn't provide a lot of wiggle room.

Klaus

On 1/7/08 7:56 PM, "Changer Van" <[EMAIL PROTECTED]>did etch on stone tablets:



On Jan 8, 2008 1:35 AM, Isaac Huang <[EMAIL PROTECTED]> wrote:
On Mon, Jan 07, 2008 at 06:20:52PM +0800, Changer Van wrote:
>    ......
>    # dmesg
>
>    LustreError: 4273:0:(viblnd.c :1890:kibnal_startup())
>
>             Can't find an active port on InfiniHost_III_Ex0

It meant that viblnd couldn't find a port whose link state was active
on the hca InfiniHost_III_Ex0, i.e . no link on the device was usable.

Was there any other error messages from viblnd before this one?
There was no error messages but a related message
like 'ADDRCONF(NETDEV_UP):ipoib0: link is not ready'.
Did you see this problem on just one node?
There are four nodes which can not mount the lustre system.
The other nodes can mount the lustre but got the following error messages:

# dmesg
divert: not allocating divert_blk for non-ethernet device ipoib0
ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
     ip_route_output_key(127.0.0.1 <http://127.0.0.1> ) failed

new: ipoib_allow_arp_joins: 1
ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
     ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed

ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
     ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed

ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
     ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed


How can I check the link on the device? Thanks in advance.




_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
[EMAIL PROTECTED]




_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to