Kevin Van Maren wrote:
> It is not clear if you have a library path issue or not, as you 
> trimmed too much from the strace.  I would say not, as if you did you 
> would get an exec error about not being able to find a shared library, 
> not "nothing".
>
> Rather it sounds like the driver is not loading properly.  ibstat 
> should work even w/o a subnet manager running.
>
> This very much could have been caused by your loading the wrong 
> firmware for that card.  Given that the PSID was different, are you 
> sure you flashed the right firmware for that card?
>
> Kevin
You're right :
mlx4_core: Mellanox ConnectX core driver v1.0 (April 4, 2008)
mlx4_core: Initializing 0000:03:00.0
mlx4_core 0000:03:00.0: command 0x13 failed: fw status = 0x1
mlx4_core 0000:03:00.0: SW2HW_EQ failed (-5)
mlx4_core 0000:03:00.0: Failed to initialize event queue table, aborting.
mlx4_core: probe of 0000:03:00.0 failed with error -5

Then, i can't see the card with mrtflint :

[root at Lidia ~]# lspci | grep Mell
03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 
2.5GT/s] (rev a0)

[root at Lidia ~]# mstflint -d 03:00.0 v
Warning: memory access to device 03:00.0 failed: Input/output error
Warning: Fallback on IO: much slower, and unsafe if device in use.
*** ERROR *** Can not open 03:00.0: Not a directory MFE_CR_ERROR


However, when i'm using ubuntu (liveCD), i can see the card with mstflint:
http://img29.imageshack.us/img29/9251/mstflint.png

I installed the firmware from here :
http://www.mellanox.com/content/pages.php?pg=firmware_table_Sun

I've got a SUN0070000001 (375-3549, X4217A-Z)

I also think that this firmware is odd.

Do you know where i can have the right one ?

Thanks,

Yann


>
>
> Rhys McMurdo wrote:
>> Hi Yann,
>>
>> Firstly, this probably isn't the best list to ask these questions. 
>> There is a mailing list for the Linux HPC software stack available at 
>> linux_hpc_swstack at lists.lustre.org 
>> <mailto:linux_hpc_swstack at lists.lustre.org>
>>
>> Secondly, if I had to guess at your problem it looks like either you 
>> may not have an OpenSM daemon running, or you library paths are not 
>> right.
>>
>> Check the opensmd status via /etc/init.d/opensmd status. Also, what 
>> does ldd /usr/sbin/ibstat show?
>>
>> Regards,
>>
>> Rhys
>>
>> 2009/9/21 Yann JOBIC <jobic at polytech.univ-mrs.fr 
>> <mailto:jobic at polytech.univ-mrs.fr>>
>>
>>     Hello,
>>
>>     I've got 2 X4600, centos 5.3, the last firmware for 375-3549 cards
>>     from the mellanix website (for sun cards), and sun hpc software
>>     for linux.
>>
>>     When i'm running ibstat, in order to check the health of my
>>     infiniband cards, i've got nothing.
>>     When i'm running the strace tool to see what happened, i've got :
>>
>>     [root at Lidia ~]# strace ibstat
>>     execve("/usr/sbin/ibstat", ["ibstat"], [/* 34 vars */]) = 0
>>     brk(0)                                  = 0x1bb06000
>>     mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
>>     -1, 0) = 0x2b47a0ae0000
>>     uname({sys="Linux", node="Lidia", ...}) = 0
>>     access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file
>>     or directory)
>>     
>> open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/tls/x86_64/libopensm.so.2",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     stat("/usr/mpi/gnu/ClusterTools-8.2/lib/64/tls/x86_64",
>>     0x7fff09fc8b80) = -1 ENOENT (No such file or directory)
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/tls/libopensm.so.2",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     stat("/usr/mpi/gnu/ClusterTools-8.2/lib/64/tls", 0x7fff09fc8b80) =
>>     -1 ENOENT (No such file or directory)
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/x86_64/libopensm.so.2",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     stat("/usr/mpi/gnu/ClusterTools-8.2/lib/64/x86_64",
>>     0x7fff09fc8b80) = -1 ENOENT (No such file or directory)
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/libopensm.so.2",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     stat("/usr/mpi/gnu/ClusterTools-8.2/lib/64",
>>     {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
>>     [....]
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/libosmvendor.so.2",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     [...]
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/libosmcomp.so.2",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     [...]
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/libibmad.so.1",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     [...]
>>     open("/usr/mpi/gnu/ClusterTools-8.2/lib/64/libibumad.so.1",
>>     O_RDONLY) = -1 ENOENT (No such file or directory)
>>     [...]
>>
>>     It's missing some other files.
>>
>>     When i flashed the firmware, i had this warning :
>>
>>     root at Lilou ~]# mstflint -d 03:00.0 -i
>>     fw-25408-2_6_000-375-3549-01.bin b
>>
>>       Current FW version on flash:  2.5.100
>>       New FW version:               2.6.0
>>
>>       You are about to replace current PSID on flash - "SUN0070000001"
>>     with a different PSID - "SUN0070130001".
>>       Note: It is highly recommended not to change the PSID.
>>
>>     Do you want to continue ? (y/n) [n] : y
>>
>>     Burning second FW image without signatures  - OK  Restoring second
>>     signature                  - OK 
>>     I followed the deployment documentation. Did i miss something ?
>>     Does anybody had those kind of problems ?
>>
>>     Thanks,
>>
>>     Yann
>>
>>
>>
>>     --     ___________________________
>>
>>     Yann JOBIC
>>     HPC engineer
>>     Polytech Marseille DME
>>     IUSTI-CNRS UMR 6595
>>     Technop?le de Ch?teau Gombert
>>     5 rue Enrico Fermi
>>     13453 Marseille cedex 13
>>     Tel : (33) 4 91 10 69 39
>>      ou  (33) 4 91 10 69 43
>>     Fax : (33) 4 91 10 69 69
>>     _______________________________________________
>>     hpcdev-discuss mailing list
>>     hpcdev-discuss at opensolaris.org 
>> <mailto:hpcdev-discuss at opensolaris.org>
>>     http://mail.opensolaris.org/mailman/listinfo/hpcdev-discuss
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Linux_hpc_swstack mailing list
>> Linux_hpc_swstack at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack
>>   
>


-- 
___________________________

Yann JOBIC
HPC engineer
Polytech Marseille DME
IUSTI-CNRS UMR 6595
Technop?le de Ch?teau Gombert
5 rue Enrico Fermi
13453 Marseille cedex 13
Tel : (33) 4 91 10 69 39
  ou  (33) 4 91 10 69 43
Fax : (33) 4 91 10 69 69 

Reply via email to